5/21/2025 at 5:30:32 PM
The first number I look at these days is the file size via Ollama, which for this model is 14GB https://ollama.com/library/devstral/tagsI find that on my M2 Mac that number is a rough approximation to how much memory the model needs (usually plus about 10%) - which matters because I want to know how much RAM I will have left for running other applications.
Anything below 20GB tends not to interfere with the other stuff I'm running too much. This model looks promising!
by simonw
5/21/2025 at 9:32:42 PM
Any agentic dev software you could recommend that runs well with local models?I’ve been using Cursor and I’m kind of disappointed. I get better results just going back and forth between the editor and ChatGPT
I tried localforge and aider, but they are kinda slow with local models
by nico
5/22/2025 at 12:28:37 AM
https://github.com/block/gooseby ynniv
5/22/2025 at 3:26:30 AM
I used devstral today with cline and open hands. Worked great in both.About 1 minute initial prompt processing time on an m4 max
Using LM studio because the ollama api breaks if you set the context to 128k.
by zackify
5/22/2025 at 10:08:46 AM
How is it great that it takes 1 minute for initial prompt processing?by elAhmo
5/22/2025 at 1:47:04 PM
Have you tried using mlx or Simon Wilson’s llm?by nico
5/23/2025 at 10:35:09 AM
I’ve been playing around with Zed, supports local and cloud models, really fast, nice UX. It does lack some of the deeper features of VSCode/Cursor but very capable.by ivanvanderbyl
5/22/2025 at 6:28:14 AM
you can use ollama in VS Code's copilot. I haven't personally tried it but I am interested in how it would perform with devstralby asimovDev
5/21/2025 at 9:50:00 PM
Do you have any other interface for the model? what kind of tokens/sec are you getting?Try hooking aider up to gemini and see how the speed is. I have noticed that people in the localllama scene do not like to talk about their TPS.
by jabroni_salad
5/21/2025 at 10:20:30 PM
The models feel pretty snappy when interacting with them directly via ollama, not sure about the TPSHowever I've also ran into 2 things: 1) most models don't support tools, sometimes it's hard to find a version of the model that correctly uses tools, 2) even with good TPS, since the agents are usually doing chain-of-thought and running multiple chained prompts, the experience feels slow - this is even true with Cursor using their models/apis
by nico
5/22/2025 at 5:00:03 PM
ra-aid works pretty well with Ollama (haven't tried it with Devstral yet though)by mrshu
5/22/2025 at 8:15:28 AM
I couldn’t run it on my 16gb MBP (I tried, it just froze up, probably lots of swapping), they say it needs 32gbby davedx
5/22/2025 at 5:04:51 PM
I was able to run it on my M2 Air with 24GB. Startup was very slow but less than 10 minutes. After that responses were reasonably quick.Edit: I should point out that I had many other things open at the time. Mail, Safari, Messages, and more. I imagine startup would be quicker otherwise but it does mean you can run with less than 32GB.
by ics
5/21/2025 at 6:48:17 PM
Yes, I agree. I've just ran the model locally and it's making a good impression. I've tested it with some ruby/rspec gotchas, which it handled nicely.I'll give it a try with aider to test the large context as well.
by lis
5/21/2025 at 7:56:48 PM
In ollama, how do you set up the larger context, and figure out what settings to use? I've yet to find a good guide. I'm also not quite sure how I should figure out what those settings should be for each model.There's context length, but then, how does that relate to input length and output length? Should I just make the numbers match? 32k is 32k? Any pointers?
by ericb
5/21/2025 at 8:33:11 PM
For aider and ollama, see: https://aider.chat/docs/llms/ollama.htmlJust for ollama, see: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-c...
I’m using llama.cpp though, so I can’t confirm these methods.
by lis
5/21/2025 at 9:34:34 PM
Are you using it with aider? If so, how has your experience been?by nico
5/22/2025 at 3:28:48 AM
Ollama breaks for me. If I manually set the context higher. The next api call from clone resets it back.And ollama keeps taking it out of memory every 4 minutes.
LM studio with MLX on Mac is performing perfectly and I can keep it in my ram indefinitely.
Ollama keep alive is broken as a new rest api call resets it after. I’m surprised it’s this glitched with longer running calls and custom context length.
by zackify
5/22/2025 at 6:09:08 AM
Almost all models listed in the ollama model library have a version that's under 20GB. But whether that's a 4-bit quantization (as in this case) or more/fewer bits varies.AFAICT they usually set the default tag to sa version around 15GB.
by rahimnathwani