2/8/2026 at 2:38:00 AM
So weird/cool/interesting/cyberpunk that we have stuff like this in the year of our Lord 2026: ├── MEMORY.md # Long-term knowledge (auto-loaded each session)
├── HEARTBEAT.md # Autonomous task queue
├── SOUL.md # Personality and behavioral guidance
Say what you will, but AI really does feel like living in the future. As far as the project is concerned, pretty neat, but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.I do think that local-first will end up being the future long-term though. I built something similar last year (unreleased) also in Rust, but it was also running the model locally (you can see how slow/fast it is here[1], keeping in mind I have a 3080Ti and was running Mistral-Instruct).
I need to re-visit this project and release it, but building in the context of the OS is pretty mindblowing, so kudos to you. I think that the paradigm of how we interact with our devices will fundamentally shift in the next 5-10 years.
by dvt
2/8/2026 at 3:12:05 AM
You absolutely do not have to use a third party llm. You can point it to any openai/anthropic compatible endpoint. It can even be on localhost.by halJordan
2/8/2026 at 3:21:05 AM
Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).by dvt
2/8/2026 at 6:42:17 AM
In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.by mirekrusin
2/8/2026 at 3:30:23 AM
Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.by embedding-shape
2/8/2026 at 5:30:53 AM
I think one of the major problems with the current incarnation of AI solutions is that they're extremely brittle and hacked-together. It's a fun exciting time, especially for us technical people, but normies just want stuff to "work."Even copy-pasting an API key is probably too much of a hurdle for regular folks, let alone running a local ollama server in a Docker container.
by dvt
2/8/2026 at 9:09:52 AM
Unlike in image/video gen, at least with LLMs the "best" solution available isn’t a graph/node-based interface with an ecosystem of hundreds of hacky undocumented custom nodes that break every few days and way too complex workflows made up of a spaghetti of two dozen nodes with numerous parameters each, half of which have no discernible effect on output quality and tweaking the rest is entirely trial and error.by Sharlin
2/8/2026 at 9:23:35 AM
That's not the best solution for image or video (or audio, or 3D) any more than it is for LLMs (which it also supports.)OTOH, its the most flexible and likely to have some support for what you are doing for a lot of those, and especially if yoj are combining multiple of them in the same process.
by dragonwriter
2/8/2026 at 10:17:17 AM
Yes, "best" is subjective and that’s why I put it in quotes. But in the community it’s definitely seen as something users should and do "upgrade" to from less intimidating but less flexible tools if they want the most power, and most importantly, support for bleeding-edge models. I rarely use Comfy myself, FWIW.by Sharlin
2/8/2026 at 12:14:23 PM
> but normies just want stuff to "work."Where in the world are you getting that this project is for "normies"? Installation steps are terminal instructions and it's a CLI, clearly meant for technical people already.
If you think copying-pasting an API key is too much, don't you think cloning a git repository, installing the Rust compiler and compiling the project might be too much and hit those normies in the face sooner than the API key?
by embedding-shape
2/8/2026 at 8:08:50 AM
Yes this is not local first, the name is bad.by backscratches
2/8/2026 at 10:57:22 AM
Horrible. Just because you have code that runs not in a browser doesn't mean you have something that's local. This goes double when the code requires API calls. Your net goes down and this stuff does nothing.by outofpaper
2/9/2026 at 4:08:41 AM
For a web developer local-first only describes where the state of the program lives. In the case of this app that’s in local files. If anthropics api was down you would just use something else. Something like OpenRouter would support model fallbacks out of the boxby jdejean
2/8/2026 at 7:28:46 PM
Not to mention that you can actually have something that IS local AND runs in a browser :Dby konart
2/8/2026 at 5:35:29 PM
In a world where IT doesn't mean anything, crypto doesn't mean anything, AI doesn't mean anything, AGI doesn't mean anything, End-to-end encryption doesn't mean anything, why should local-first mean anything? We must unite against the tyranny of distinction.by yusuf288
2/8/2026 at 10:29:19 AM
It absolutely can be pointed to any standard endpoint, either cloud or local.It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.
If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.
Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.
by K0balt
2/8/2026 at 10:39:51 AM
It is not local first. Local is not the primary use case. The name is misleading to the point I almost didn't click because I do not run local models.by backscratches
2/8/2026 at 1:06:33 PM
I think the author is using local-first as in “your files stay local, and the framework is compatible with on-prem infra”. Aside from not storing your docs and data with a cloud service though, it’s very usable with cloud inference providers, so I can see your point.Maybe the author should have specified that capability, even though it seems redundant, since local-first implies local capability but also cloud compatibility, or it would be local or local-only.
by K0balt
2/8/2026 at 3:14:08 PM
It's called "LocalGPT". It's a bad name.by backscratches
2/9/2026 at 11:44:43 PM
Yeah, it’s not exactly great lol. Could be the vision behind the project though, from an aspirational standpoint. But yeah, it kinda implies it will be more like ollama or vLLM.by K0balt
2/8/2026 at 8:38:39 AM
To be precise, it’s exactly as local first as OpenClaw (i.e. probably not unless you have an unusually powerful GPU).by lxgr
2/8/2026 at 10:41:35 AM
Yes but OpenClaw (which is a terrible name for other reasons) doesn't have "local" in the name and so is not misleading.by backscratches
2/8/2026 at 3:47:59 PM
I mean, at least OpenClaw is funny in the sense that a D port could finish the roundabout by calling itself "OpenClawD"...by dancemethis
2/8/2026 at 10:58:32 AM
As misleading. Lots of their marketing push or at least thr ClawBros pitch it as running local on your MacMini.by outofpaper
2/8/2026 at 11:46:45 AM
To be fair, you do keep significantly more control of your own data from a data portability perspective! A MEMORY.md file presents almost zero lock-in compared to some SaaS offering.Privacy-wise, of course, the inference provider sees everything.
by lxgr
2/8/2026 at 12:23:47 PM
To be clear: keeping a local copy of some data provides not control over how the remote system treats that data once it’s sent.by jagged-chisel
2/10/2026 at 8:53:29 AM
Which is what I said in my second sentence.by lxgr
2/10/2026 at 5:17:04 PM
It’s worse than “[they] can see everything.” They can share it.by jagged-chisel
2/11/2026 at 3:33:58 PM
Is it not a given that anyone that gets access to a piece of information is also capable of sharing it?by lxgr
2/8/2026 at 3:56:30 PM
Confused me at first as when I saw mention of local + the single file thing in the GitHub I assumed they were going to have llamafile bundled and went looking through to see what model they were using by default.by ciaranmca
2/8/2026 at 3:10:41 AM
> but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.See here:
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
by atmanactive
2/8/2026 at 6:12:11 AM
What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6? As far as I know Kimi (while good) needs serious GPUs GTX 6000 Ada minimum. More likely H100 or H200.by nodesocket
2/8/2026 at 1:35:09 PM
Devstral¹ has very good models that can be run locally.They are in the top of open models, and surpass some closed models.
I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings
by berkes
2/8/2026 at 7:56:21 AM
Nothing will come close to Opus 4.6 here. You will be able to fit a destilled 20B to 30B model on your GPU. Gpt-oss-20B is quite good in my testing locally on a Macbook Pro M2 Pro 32GB.The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.
by mixermachine
2/8/2026 at 8:14:31 AM
There are local models with larger context, but the memory requirements explode pretty quickly so you need to lower parameter count or resort to heavy quantization. Some local inference platforms allow you to place the KV cache in system memory (while still otherwise using GPU). Then you can just use swap to allow for even very long contexts, but this slows inference down quite a bit. (The write load on KV cache is just appending a KV vector per inferred token, so it's quite compatible with swap. You won't be wearing out the underlying storage all that much.)by zozbot234
2/8/2026 at 7:12:39 AM
I made something similar to this project, and tested it against a few 3B and 8B models (Qwen and Ministral, both the instruction and the reasoning variants). I was pleasantly surprised by how fast and accurate these small models have become. I can ask it things like "check out this repo and build it", and with a Ralph strategy eventually it will succeed, despite the small context size.by lodovic
2/8/2026 at 8:34:33 AM
Nothing close to Opus is available in open weights. That said, do all your tasks need the power of Opus?by PeterStuer
2/8/2026 at 8:40:49 AM
The problem is that having to actively decide when to use Opus defeats much of the purpose.You could try letting a model decide, but given my experience with at least OpenAI’s “auto” model router, I’d rather not.
by lxgr
2/8/2026 at 9:47:31 AM
I also don't like having to think about it, and if it were free, I would not bother even though keeping up a decent local alternative is a good defensive move regardless.But let's face it. For most people Opus comes at a significant financial cost per token if used more than very casual, so using it for rather trivial or iterative tasks that nevertheless consume a lot of those is something to avoid.
by PeterStuer
2/8/2026 at 7:19:16 AM
I'm playing with local first openclaw and qwen3 coder next running on my LAN. Just starting out but it looks promising.by __mharrison__
2/8/2026 at 4:00:19 PM
On what sort of hardware/RAM? I've been trying ollama and opencode with various local models on a 16Gb RAM, but the speed, and accuracy/behaviour just isn't good enough yet.by bluerooibos
2/9/2026 at 4:57:12 AM
DGX Spark (128gb)by __mharrison__
2/8/2026 at 5:49:46 AM
> Say what you will, but AI really does feel like living in the future.Love or hate it, the amount of money being put into AI really is our generation's equivalent of the Apollo program. Over the next few years there are over 100 gigawatt scale data centres planned to come online.
At least it's a better use than money going into the military industry.
by fy20
2/8/2026 at 8:51:48 AM
The Apollo program was peanuts in comparison:https://www.wsj.com/tech/ai/ai-spending-tech-companies-compa...
https://www.reuters.com/graphics/USA-ECONOMY/AI-INVESTMENT/g...
by T-A
2/8/2026 at 6:22:19 AM
What makes you think AI investment isn't a proxy for military advantage? Did you miss the saber rattling of anti-regulation lobbying, that we cannot pause or blink or apply rules to the AI industry because then China would overtake us?by jazzyjackson
2/8/2026 at 10:32:38 AM
You know they will never come on line. A lot of it is letters of intention to invest with nothing promised, mostly to juice the circular share price circuils.by adammarples
2/8/2026 at 8:11:07 PM
Most of these AI companies are part of the military industry. So the money is still going there at the end of the day.by ryan_n
2/8/2026 at 6:05:26 AM
LoL, don't worry they are getting their dose of the snakeoil tooby pwndByDeath
2/8/2026 at 6:28:39 AM
IMHO it doesn't make sense, financially and resource wise to run local, given the 5 figure upfront costs to get an LLM running slower than I can get for 20 USD/m.If I'm running a business and have some number of employees to make use of it, and confidentiality is worth something, sure, but am I really going to rely on anything less then the frontier models for automating critical tasks? Or roll my own on prem IT to support it when Amazon Bedrock will do it for me?
by jazzyjackson
2/8/2026 at 8:59:42 AM
That’s probably true only as long as subscription prices are kept artificially low. Once the $20 becomes $200 (or the fast-mode inference quotas for cheap subs become unusably small), the equation may change.by Sharlin
2/8/2026 at 1:47:12 PM
This field is highly competitive. Much more than I expected it to. I thought the barrier to entry was so high, only big tech could seriously join the race, because of costs, or training data etc.But there's fierce competition by new or small players (deepseek, Mistral etc), many even open source. And Icm convinced they'll keep the prices low.
A company like openai can only increase subscriptions x10 when they've locked in enough clients, have a monopoly or oligopoly, or their switching costs are multitudes of that.
So currently the irony seems to be that the larger the AI company, the more loss they're running at. Size seems to have a negative impact on business. But the smaller operators also prevent companies from raising prices to levels at which they make money.
by berkes
2/8/2026 at 9:56:25 PM
There's no way around the cost of electricity, at least in the short term. Nobody has come up with a way to meaningfully scale capacity without scaling parameter count (≈energy use). Everybody seems to agree that the newest Claudes are the only coding models capable of some actually semi-challenging tasks, and even those are prone to all the usual failure modes and require huge amounts of handholding. No smaller models seem to get even close.by Sharlin
2/8/2026 at 8:17:21 AM
It starts making a lot of sense if you can run the AI workloads overnight on leaner infrastructure rather than insist on real-time response.by zozbot234
2/8/2026 at 12:43:18 PM
The usage limits on most 20 USD/month subs are becoming quite restrictive though. API pricing is more indicative of true cost.by zipy124
2/8/2026 at 12:44:06 PM
> but AI really does feel like living in the future.Got the same feeling when I put on the Hololens for the first time but look what we have now.
by croes
2/8/2026 at 6:22:59 PM
What does ANTHROPIC bring to this project that a local LLM cannot, e.g. Gwen3 Coder Next?by mycall
2/8/2026 at 4:57:39 AM
local first is not the future, lmfao, maybe in 10-20 years. It currently cost ~80k-100k to run a pretty meh Kimi 2.5 at decent tok p/s, which is rather useless anyways. And that doesn't allow you to run any multi-agent sessions.By time hardware costs shrink to allow you to run useful models, concurrently in multi agent environments, they'll have already devalued labor on a scale never before seen.The layoffs and labor will cause us all to work for morsels, on whatever work opportunities remain. Eventually you'll beg to fight in a war.
LLMs are only here to attack labor, devalue the working class and eventually make us useless to the ruling class. LLMs do not create opportunities/jobs, they replace the inputs to labor, humans. That's their only purpose.
But I guess most llm-kiddies think they're going to vibe code their way out of the working class with Anthropic's latest slop offering. Good luck with that. In 5 years your labor will be worth a 1/4 maybe 1/2 of what it is now, and that vibe coded startup of yours will have been made 5000x times over by every other delusional llm-kiddie.
Have fun with your GPU, you won't be able to afford a 60 series, if they even make one, and it certainly won't be powerful enough to pull you out of the black mirror episode we're heading towards.
I recommend learning and not frying your brain with "Think for me Saas", and not being dependent on Meta or Alibaba open sourcing some model that allows you to compete with them.
by IhateAI