6/5/2026 at 6:38:45 PM
I just ran one of these locally on a Mac like this: uvx litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--backend=gpu \
--prompt="Generate an SVG of a pelican riding a bicycle"
The first time you run that it downloads 3.2GB to ~/.cache/huggingface/hub/models--litert-community--gemma-4-E2B-it-litert-lmIt can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:
uvx litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--backend=gpu --vision-backend gpu \
--attachment image.jpg --prompt describe
And for audio: uvx litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--backend=gpu --audio-backend cpu \
--attachment audio.wav --prompt transcribe
(The pelican is rubbish, but it's only a 3.2GB file so the fact it even outputs valid SVG is impressive to me: https://gist.github.com/simonw/94b318afde4b1ce5ff67d4b5d0362... )
by simonw
6/5/2026 at 8:44:36 PM
Not to mention the text-only 0.8GB version. Just crazy. You can have basic real-time conversations on-device that's video and audio aware now.by reactordev
6/6/2026 at 4:00:26 AM
0.8GB is for text only. It's more like ~1.1GB if you include video/audio encoderby yalok
6/6/2026 at 4:25:26 PM
And your point is what? That’s more than 0.8GB text only if you include more than, text-only?by reactordev
6/5/2026 at 9:38:41 PM
Have you seen a 0.8GB model file floating around yet? I couldn't find one earlier.by simonw
6/5/2026 at 11:06:31 PM
I think this is the one but it’s 0.8GB VRAM not 0.8GB size.https://huggingface.co/google/gemma-4-E2B-it-qat-mobile-ct
But they could be cooking up a smaller one because the model card lists the Q_4 quants as being bigger than the mobile or text-only so I think we’ll need to wait for the Q_2_Distilled_Mobile_Textformer version. Still, just amazing work.
by reactordev
6/6/2026 at 6:11:48 AM
I'll be honest with you. My main ask for on device AI is that when I am typing "Going out for a quick j" it corrects to "jog" and not "Jonathan". I don't think it needs that many gigabytes.by viccis
6/6/2026 at 7:25:37 AM
Who doesn't enjoy a quick Jonathan now and then.But seriously, wouldn't productive text on a 90s cell phone pass this test?
by taffydavid
6/6/2026 at 11:08:10 AM
The autocomplete of a decade ago is better than what we have now.It’s harder now because emojis and draw-to-type as well as pen input. We didn’t have these things 14 years ago when “I’ll be right back” could be expanded from “I’ll b ri ba”
by reactordev
6/6/2026 at 5:57:24 AM
Where is it? On ollama I see only the bigger oneby madduci
6/6/2026 at 4:26:22 PM
I don’t use ollama, can you pull from HF?by reactordev
6/5/2026 at 11:12:29 PM
Is that actually QAT? the MLX Community models have that in their names, but these don't, and the upload dates don't quite line up.by rcarmo
6/5/2026 at 11:19:14 PM
As an aside uvx is so pleasant to use... I wish Nvidia supported it as first-class rather than making folks jump through Docker hoops.by __mharrison__
6/6/2026 at 2:18:16 AM
I wish people would stop using python sure ai.It's slow and the PKG resolution is way too flat.
by NamlchakKhandro
6/6/2026 at 7:44:05 AM
What do you use?by qwertox