alt.hn

4/5/2026 at 5:13:51 PM

Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

https://ai.georgeliu.com/p/running-google-gemma-4-locally-with

by vbtechguy

4/5/2026 at 7:11:42 PM

  ollama launch claude --model gemma4:26b

by trvz

4/5/2026 at 7:28:53 PM

It's amazing how simple this is, and it just works if you have ollama and claude installed!

by datadrivenangel

4/5/2026 at 7:17:48 PM

So wait what is the interaction between Gemma and Claude?

by jonplackett

4/5/2026 at 7:21:50 PM

lm studio offers an Anthropic compatible local endpoint, so you can point Claude code at it and it'll use your local model for it's requests, however, I've had a lot of problems with LM Studio and Claude code losing it's place. It'll think for awhile, come up with a plan, start to do it and then just halt in the middle. I'll ask it to continue and it'll do a small change and get stuck again.

Using ollama's api doesn't have the same issue, so I've stuck to using ollama for local development work.

by unsnap_biceps

4/5/2026 at 7:41:56 PM

Claude Code is fairly notoriously token inefficient as far as coding agent/harnesses go (i come from aider pre-CC). It's only viable because the Max subscriptions give you approximately unlimited token budget, which resets in a few hours even if you hit the limit. But this also only works because cloud models have massive token windows (1M tokens on opus right now) which is a bit difficult to make happen locally with the VRAM needed.

And if you somehow managed to open up a big enough VRAM playground, the open weights models are not quite as good at wrangling such large context windows (even opus is hardly capable) without basically getting confused about what they were doing before they finish parsing it.

by keerthiko

4/5/2026 at 7:58:59 PM

I use CC at work, so I haven't explored other options. Is there a better one to use locally? I presumed they were all going to be pretty similar.

by unsnap_biceps

4/5/2026 at 7:47:57 PM

Can't you use Claude caveman mode?

https://github.com/JuliusBrussee/caveman

by storus

4/5/2026 at 5:13:51 PM

Here is how I set up Gemma 4 26B for local inference on macOS that can be used with Claude Code.

by vbtechguy

4/5/2026 at 6:48:17 PM

This is a nice writeup!

by canyon289

4/5/2026 at 7:44:45 PM

Just FYI, MoE doesn't really save (V)RAM. You still need all weights loaded in memory, it just means you consult less per forward pass. So it improves tok/s but not vram usage.

by martinald

4/5/2026 at 8:03:28 PM

It does if you use an inference engine where you can offload some of the experts from VRAM to CPU RAM. That means I can fit a 35 billion param MoE in let's say 12 GB VRAM GPU + 16 gigs of memory.

by IceWreck

4/5/2026 at 7:25:52 PM

Using Claude Code seems like a popular frontend currently, I wonder how long until Anthropic releases an update to make it a little to a lot less turn-key? They've been very clear that they aren't exactly champions of this stuff being used outside of very specific ways.

by Someone1234

4/5/2026 at 8:06:51 PM

Is it not about the same as using OpenCode?

And is running a local model with Claude Code actually usable for any practical work compared to the hosted Anthropic models?

by chvid

4/5/2026 at 7:43:06 PM

Right now it suits them down to the ground. You pay for the product and you don’t cost their servers anything.

by moomin

4/5/2026 at 7:49:38 PM

You don't pay anything to use Claude Code as a front end to non-Anthropic models

by phainopepla2

4/5/2026 at 8:23:54 PM

so no subscription is needed?

by quinnjh

4/5/2026 at 8:26:07 PM

I think CC is popular because they are catering to the common denominator programmer and are going to continue to do that, not because CC is particularly turn-key.

by wyre