6/1/2026 at 10:29:50 AM
It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.Code is text. LLMs are text input/output machines.
Game input/output is not at all text.
LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)
by ceheaaf
6/1/2026 at 11:29:25 AM
But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there.I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.
by kqr
6/1/2026 at 1:08:24 PM
To pile on, they're also bad at games that are 2D text based environments.ARC-AGI-3 shows this: https://arcprize.org/arc-agi/3
I've done some work as well on Rogue (sorry for self-promotion): https://iwhalen.github.io/rogue-bench/
by iwhalen
6/2/2026 at 9:11:46 AM
There is no "2D text" processing when it comes to LLMs. They process text as ordinary, sequential 1D text only. And humans process "2D text" like any other 2D image. So 2D text isn't really a thing in any case. Saying LLMs are bad at 2D text is like saying that humans are bad at 2D audio.by cubefox
6/1/2026 at 11:43:33 AM
They are also pretty bad at navigating mazes (which can be somewhat similar in spirit to text adventures where you need to navigate through text): https://arxiv.org/abs/2507.20395by haffi112
6/1/2026 at 11:31:03 AM
The other reason is lack of continual learning, especially for long games like RPGs.by cubefox
6/1/2026 at 11:30:56 AM
LLMs are used for OpenClaw and similar to do tasks for their user.Games are a bunch of tasks too.
So if they fail at game tasks maybe it’s a bad idea to advertise those LLMs as task doing assistants.
by croes