3/24/2026 at 8:09:18 PM
Suggestion for the maintainers: the comparison table currently lists some pretty old models, Qwen 2.5 14B and Mixtral 8x7B and Llama 3.3 70B.A lot of people are reporting incredible results with the Qwen 3.5 MoE models on Apple hardware right now (streaming experts - see https://simonwillison.net/2026/Mar/24/streaming-experts/) - it would be great to get some of those models into that table.
Maybe the 1T parameter Kimi K2.5 too if you can get that to work, see https://twitter.com/seikixtc/status/2036246162936910322 and https://twitter.com/danpacary/status/2036480556045836603
by simonw
3/24/2026 at 9:29:38 PM
Thanks for sharing this! If you'd be interested in running the benchmark yourself with Hypura I'd happily merge into our stats. Otherwise will add to my todo list :)by tatef
3/24/2026 at 8:29:36 PM
Simon, A little offtopic but it seems that your website isn't working.> An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command
I get this error when I go to simonwillison.net
Any random blog/link works for example though: https://simonwillison.net/2026/Mar/19/openai-acquiring-astra...
(I checked your website because I wanted to see if you had written something about trivy/litellm as well, I highly recommend checking out what has happened within litellm space if possible as I would love to read your thoughts on it)
Have a nice day simon!
Edit: now the website works but I am not sure what had gone wrong previously, (an issue from heroku maybe?) as its working now
Edit-2: after the website working, I am able to see that you have already made a post about it.
by Imustaskforhelp
3/24/2026 at 9:18:31 PM
The lack of a token rate metric for the kimi example is disappointing.by abtinf
3/24/2026 at 10:11:06 PM
The latter link says they get ~1.7 tok/s which is quite impressive for a near-SOTA local model running on ordinary hardware.by zozbot234