5/20/2026 at 7:25:55 PM
It's interesting how even 5 tok/s is still much faster than you'd typically type, but feels glacially slow for an agent.On the other hand, I've been using Mimo and Minimax a lot recently. They routinely reach 100-150 tokens per second and that feels too fast, to the point where it's hard to keep up with what it's actually doing. Great for subagents though.
by ricardobeat
5/21/2026 at 9:30:55 AM
> It's interesting how even 5 tok/s is still much faster than you'd typically type, but feels glacially slow for an agent.Calling the token rate the rate at which they "type" is a bit misleading. They also do virtually all of their more complex reasoning in tokens, so 5 tokens per second is also their thinking speed. And thinking at 5 tokens per second is glacially slow.
This is why faster versions of strong models do so well on reasoning tasks like playing text adventure games[1]. Their output isn't better on a token-for-token basis, but they get so much more thinking in during a given time window, they get more opportunities to find the right conclusion.
by kqr
5/21/2026 at 10:04:53 AM
How many tok/s does an average human think?by michibertel
5/21/2026 at 10:10:06 AM
Most of my thinking is non-verbal. I don't think in sentences. I CAN think in sentences and internally rationalize my actions and explain them and sometimes that's beneficial (rubber duck debuggin, sometimes it's good to verbalize and explain something) but usually I don't do itby madwolf
5/21/2026 at 12:30:38 PM
This question gets into information theory way beyond me, but I suspect it depends a lot on the task at hand. Human brains aren't very effective at combining sources of statistical variation, but they're great at other things. I'm personally most impressed by the cerebellum. It is highly trainable, yet if we tried to translate the things it does to maintain locomotion, proprioception, coordination of movement, etc. into tokens would probably result in a high token rate.by kqr
5/20/2026 at 7:54:19 PM
They routinely reach 100-150 tokens per second and that feels too fast, to the point where it's hard to keep up with what it's actually doing.There is no way you can follow what is going on even at 30 tokens per second. Maybe you can maintain a rough idea of what is going on for some tens of seconds but that is probably about it. Follow it in any detail, no chance. Reason about what you read, absolutely no chance.
800 tok/s — Cerebras-class, where the bottleneck is your eyeballs
I do not understand why they say this. I am not sure if it is even true. 800 tokens sounds like a page of text and I would assume you can look at one page per second without hitting any limitation of your eyes. Or is the resolution of the human not good enough to see an entire page at once and you have to scan it with the fovea? Scrolling text might of course hit the temporal resolution limit. But why does this even matter, your brain can not process anything close to the amount of information your eyes can take in.
by danbruc
5/20/2026 at 9:47:28 PM
The angular diameter of detailed seeing is very small - something like 1-2 degrees from what I was reading (matches my experience). That's the only area where you can reasonably read, the rest is only good for making out rough shape. So scanning it is.by 3form
5/20/2026 at 10:13:24 PM
On top of the other comments, this reads like a half-joke.by travisjungroth
5/20/2026 at 9:29:17 PM
>I do not understand why they say this.Click on 800.
Try to read the text.
You'll understand.
by moralestapia
5/20/2026 at 10:22:29 PM
Because it is scrolling. If they would show one page of text while filling the next one in the background, the result would probably be somewhat like flicking through a book at one page per second. You still can not read one page per second but you would not be limited by your eyes being unable to recognizing the quickly scrolling text.EDIT: As others have pointed out and I now did some reading on, it is an illusion that you can see all the text on a page at once, that is beyond the resolution limit of the human eye. To actually see all the words, you have to scan the page and that takes several seconds. From the numbers I have seen, it seems that the ultimate limit is probably below 30 tokens per second, no matter what, even using rapid serial visual presentation to cut out eye movements. Even 10 to 20 tokens per second is probably pushing it and unsustainable for many, if not most, people.
by danbruc
5/20/2026 at 10:58:33 PM
Did someone say rapid serial visual presentation? I made a tool for that! Https://wordflashreader.vercel.appby SpyCoder77
5/21/2026 at 2:20:05 PM
Looking at 5 tok/s after reading this comment made me think about why it felt slow and would be unacceptable for work. If you didn't plan or even sometimes despite planning, you have absolutely no idea if it is suddenly going to go off the rails in a wrong direction. Everyday, I'll look at the thinking and it seems pretty good until suddenly I have to slam the esc key because it decided to pursue a completely wrong direction. Much faster is better for skimming to make sure you don't have to throw everything away.by Larrikin
5/21/2026 at 4:41:57 PM
and if it goes fast enough, it can fail and you can prompt again before you need to worry about if it's going in the wrong direction. or, it will try multiple directions!by repparw
5/21/2026 at 7:40:00 AM
I think the metric should be reading speed, not writing speed. At the very least it should be speech speed.by j_maffe
5/20/2026 at 10:11:09 PM
I run models in the ~120B class on my old server (96GB DDR4) and it manages about 3-3.5 tok/sec. It is indeed painfully slow to watch, but I find if I walk away or bury the window and do something else, it always seems to be done when I check backby metalliqaz
5/21/2026 at 12:49:11 AM
isn't 5 tok/s like 100wpm? Pretty standard typing speed.You also would need to compare token generation not with the actual output, but with the thoughts and deleted and edited parts.
by TZubiri
5/21/2026 at 3:22:55 AM
100wpm is well above what the average person types at, which is estimated at about 40wpm.100wpm might still bit a bit high even for your average programmer.
by HDBaseT
5/21/2026 at 3:03:24 AM
It's about 240 wpm on text.by nearbuy