6/2/2026 at 6:44:24 AM
Groq stopped serving Kimi K2 (1T params) when they got aquihired by NVIDIA, so I guess NVIDIA took most of the hardware in addition to the employees. The largest model they serve now is the relatively minuscule gpt-oss-120b.The community support forum is also getting retired and there haven't been any posts by support employees in forever anyway, so they are probably gone, too. Also, the number of issues have been piling up, suggesting that the developers are gone as well. https://community.groq.com/c/forum/4 (archive link for when it goes down https://web.archive.org/web/20260602064050/https://community...)
To me, it looks they are trying to raise 650M with a few remaining (ancient) LPUs and no employees.
by gpugreg
6/2/2026 at 10:31:18 AM
Before they removed it, I was using groq Kimi K2 model for a chat bot in small community site/chat. It was really good, seemed to have incredibly vast general world knowledge and the fast speed (400tok/s if I remember right) meant that chat users got a response instantly which was a much better experience compared to other SOTA models at the time.On the bright side it looks like Cerebras might be serving Kimi K2.6 at 1000tok/s soon https://www.cerebras.ai/blog/cerebras-kimi-k2-Enterprise
by batperson
6/2/2026 at 12:13:33 PM
Those were amazing times. You could vibe code an entire prototype in seconds (200 tps). With Qwen3.6-35B-A3B and MTP, you can program at that speed on a single GPU at home now, but Kimi K2 is of course much smarter at almost 30 times the size.I'm also looking forward for the Cerebras Kimi K2.6 release, which should be even better at 1000 tps. It is hard to overstate how important speed is for programming. Instead of having to wait for a few minutes until a task is done, it is just done instantly, and you don't have to context switch from whatever else you were working on while waiting.
I hope they will make it available to regular customers.
by gpugreg
6/2/2026 at 1:30:11 PM
But too much of a speed doesn’t allow you to build up the context as the llm is working, it’s a two-edged sword.by throw1234567891
6/2/2026 at 12:22:10 PM
Cerebras are only serving kimi for dedicated endpoint customers; for that you need a >$5m annual deal with themCerebras also seems to be killing off their regular APIs, they're deprecating models and GLM is still stuck on GLM 4.7, a whole 2 versions behind.
by trouve_search
6/2/2026 at 12:47:23 PM
I was quite baffled they removed it and didn't double down on Kimi and serving the latest models instead.Thanks for the tip, looks fire.
by tiborsaas
6/2/2026 at 1:28:43 PM
> The largest model they serve now is the relatively minuscule gpt-oss-120bThis model will run on any laptop with 128GB RAM, wow.
by throw1234567891