alt.hn

2/21/2026 at 9:42:04 AM

Instant AI Response

https://chatjimmy.ai/

by hochmartinez

2/21/2026 at 5:14:28 PM

Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.

by OutOfHere

2/21/2026 at 2:28:04 PM

What model and hardware powers this?

Is this a Google T5 based model?

by nacs

2/21/2026 at 2:35:02 PM

3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )

by pella

2/21/2026 at 7:52:45 PM

3bit is a bit ridiculous. From that page I am unclear if the current model is 3 or 4bit. If it’s 4bit… well, NVIDIA showed that a well organized model can perform almost as well as 8bit.

by cyansmoker

2/21/2026 at 2:57:38 PM

I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).

by alansaber

2/21/2026 at 2:56:52 PM

If this is possible, why not all online AI engines work like this?

by Kuyawa

2/21/2026 at 3:20:02 PM

This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.

If you want to run a different model you need new hardware for that new model.

by yomismoaqui

2/21/2026 at 3:42:49 PM

It is really a crazy speed. 15k tokens/second.

by sixtyj

2/21/2026 at 8:12:17 PM

I have tried it again. This is the future of chat UI, imho.

Generated in 0,074s • 15 754 tok/s

by sixtyj

2/21/2026 at 3:04:50 PM

imagine a model like opus 4.6 at that speed, that would be insane

by notronic