2/10/2026 at 4:56:59 AM
If folks are interested, @antirez has opened a C implementation of Voxtral Mini 4B here: https://github.com/antirez/voxtral.cI have my own fork here: https://github.com/HorizonXP/voxtral.c where I’m working on a CUDA implementation, plus some other niceties. It’s working quite well so far, but I haven’t got it to match Mistral AI’s API endpoint speed just yet.
by HorizonXP
2/10/2026 at 12:11:40 PM
hey,how does someone get started with doing things like these (writing inference code/ cuda etc..). any guidance is appreciated. i understand one doesn't just directly write these things and this would require some kind of reading. would be great to receive some pointers.
by kingreflex
2/11/2026 at 12:51:39 AM
You know, I love this comment because you are where I was 15 years ago when I naively decided that I wanted to do my master's in medical biophysics and try to use NVIDIA CUDA to help accelerate some of the work that we were doing. So I have a very... storied history with NVIDIA CUDA, but frankly, it's been years since I've actually written C code at all, let alone CUDA.I have to admit that I wrote none of the code in this repo. I asked Codex to go and do it for me. I did a lot of prompting and guidance through some of the benchmarking and tools that I expected it to use to get the result that I was looking for.
Most of the plans that it generated were outside of my wheelhouse and not something I'm particularly familiar with, but I know it well enough to understand that its plan roughly made sense to me and I just let it go. So the fact that this worked at all is a miracle, but I cannot take credit for it other than telling the AI: what I wanted, how to do it, in loose terms, and helping it when it got stuck.
BTW, everything above was dictated with the code we generated, except for this sentence. And I added breaklines for paragraphs. That's it.
by HorizonXP
2/11/2026 at 7:43:55 AM
[dead]by cgfjtynzdrfht
2/10/2026 at 3:46:31 PM
These are good lectures and there is also a discord. https://github.com/gpu-mode/lecturesby briandw
2/10/2026 at 12:35:22 PM
Same! Would love any resources. I'm interested more in making models run vs making the models themselves :)by Kilenaitor
2/10/2026 at 6:16:41 AM
There is also another Mistral implementation: https://github.com/EricLBuehler/mistral.rs Not sure what the difference is, but it seems to be just be overall better received.by Ygg2
2/10/2026 at 6:39:03 AM
mistral.rs is more like llama.cpp, it's a full inference library written in rust that supports a ton of models and many hardware architectures, not just mistral models.by NitpickLawyer