2/21/2026 at 10:50:01 AM
Papers like these are much needed bucket of ice water. We antropomorphize these systems too much.Skimming through conclusions and results, the authors conclude that LLMs exhibit failures across many axes we'd find to be demonstrative of AGI. Moral reasoning, simple things like counting that a toddler can do, etc. They're just not human and you can reasonably hypothesize most of these failures stem from their nature as next-token predictors that happen to usually do what you want.
So. If you've got OpenClaw running and thinking you've got Jarvis from Iron Man, this is probably a good read to ground yourself.
Note there's a GitHub repo compiling these failures from the authors: https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failur...
by sergiomattei
2/21/2026 at 11:57:49 AM
Isn't it strange that we expect them to act like humans even though after a model was trained it remains static? How is this supposed to be even close to "human like" anywayby vagrantstreet
2/21/2026 at 12:36:43 PM
> Isn't it strange that we expect them to act like humans even though after a model was trained it remains static?An LLM is more akin to interacting with a quirky human that has anterograde amnesia because it can't form long-term memories anymore, it can only follow you in a long-ish conversation.
by mettamage
2/21/2026 at 12:14:26 PM
If we could reset a human to a prior state after a conversation then would conversations with them not still be "human like"?I'm not arguing that LLMs are human here, just that your reasoning doesn't make sense.
by LiamPowell
2/21/2026 at 12:47:31 PM
Henry Molaison was exactly this.by hackinthebochs
2/21/2026 at 2:43:49 PM
I mean you can continue to evolve the model weights but the performance would suck so we don't do it. Models are built to an optimal state for a general set of benchmarks, and weights are frozen in that state.by alansaber
2/21/2026 at 1:31:37 PM
> We antropomorphize these systems too much.They're sold as AGI by the cloud providers and the whole stock market scam will collapse if normies are allowed to peek behind the curtain.
by otabdeveloper4
2/21/2026 at 2:44:50 PM
The stock market being built on conjecture? Surely not sir.by alansaber
2/21/2026 at 2:15:01 PM
> conclude that LLMs exhibit failures across many axes we'd find to be demonstrative of AGI.Which LLMs? There's tons of them and more powerful ones appear every month.
by throw310822
2/21/2026 at 2:40:49 PM
True but the fundamental architecture tends not to be radically different, it's more about the training/RL regimeby alansaber
2/21/2026 at 2:54:29 PM
But the point is that to even start to claim that a limitation holds for all LLMs you can't use empirical results that have been demonstrated only for a few old models. You either have a theoretical proof, or you have empirical results that hold for all existing models, including the latest ones.by throw310822
2/21/2026 at 2:08:47 PM
Most of the claims are likely falsified using current models. I wouldn’t take many of them seriously.by simianwords
2/21/2026 at 2:58:39 PM
I wouldn't take baseless "likely" claims or the people who make them seriously.by jibal
2/21/2026 at 5:24:57 PM
I falsified it on another threadby simianwords
2/21/2026 at 12:40:07 PM
https://en.wikipedia.org/wiki/List_of_cognitive_biasesSpecifically, the idea that LLMs fail to solve some tasks correctly due to fundamental limitations where humans also fail periodically well may be an instance of the fundamental attribution error.
by lostmsu