2/21/2026 at 2:33:10 PM
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181by personalcompute
2/21/2026 at 9:42:04 AM
by hochmartinez
2/21/2026 at 2:33:10 PM
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181by personalcompute
2/21/2026 at 2:33:23 PM
- https://news.ycombinator.com/item?id=47086181- https://taalas.com/the-path-to-ubiquitous-ai/
- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
by pella
2/21/2026 at 5:14:28 PM
Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.by OutOfHere
2/21/2026 at 2:28:04 PM
What model and hardware powers this?Is this a Google T5 based model?
by nacs
2/21/2026 at 2:35:02 PM
3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )by pella
2/21/2026 at 7:52:45 PM
3bit is a bit ridiculous. From that page I am unclear if the current model is 3 or 4bit. If it’s 4bit… well, NVIDIA showed that a well organized model can perform almost as well as 8bit.by cyansmoker
2/21/2026 at 2:57:38 PM
I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).by alansaber
2/21/2026 at 2:56:52 PM
If this is possible, why not all online AI engines work like this?by Kuyawa
2/21/2026 at 3:20:02 PM
This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.If you want to run a different model you need new hardware for that new model.
by yomismoaqui
2/21/2026 at 3:42:49 PM
It is really a crazy speed. 15k tokens/second.by sixtyj
2/21/2026 at 8:12:17 PM
I have tried it again. This is the future of chat UI, imho.Generated in 0,074s • 15 754 tok/s
by sixtyj
2/21/2026 at 3:04:50 PM
imagine a model like opus 4.6 at that speed, that would be insaneby notronic