1/20/2026 at 11:09:47 PM
There's a YouTuber who makes AI Plays Mafia videos with various models going against each other. They also seemingly let past games stay in context to some extent.What people have noted is that often times chatgpt 4o ends up surviving the entire game because the other AIs potentially see it as a gullible idiot and often the Mafia tend to early eliminate stronger models like 4.5 Opus or Kimi K2.
It's not exactly scientific data because they mostly show individual games, but it is interesting how that lines up with what you found.
by techjamie
1/20/2026 at 11:45:06 PM
https://www.youtube.com/watch?v=JhBtg-lyKdo - 10 AIs Play Mafiahttps://www.youtube.com/watch?v=GMLB_BxyRJ4 - 10 AIs Play Mafia: Vigilante Edition
https://www.youtube.com/watch?v=OwyUGkoLgwY - 1 Human vs 10 AIs Mafia
by nodja
1/21/2026 at 2:03:02 AM
Similar: here is a YouTube video of an amusing reverse Turing test with four LLMs and a human. To make the test more interesting, the players pose as famous historical characters (Aristotle, Mozart, da Vinci, Cleopatra, and Genghis Khan) on a train in Unity 3D.by cpeterso
1/21/2026 at 6:02:15 AM
>They also seemingly let past games stay in context to some extent.Not a trivial point, well stuided in game theory:
https://en.wikipedia.org/wiki/Repeated_game
Spiting goes from a common trap to an optimal strategy.
by TZubiri
1/21/2026 at 6:39:50 AM
One thing I've noticed from watching these games is that LLMs never used risky strategies, such as faking roles. They will happily accuse others of lying, but never openly claim to be a role that they're not themselvesby Antibabelic
1/21/2026 at 8:00:26 AM
Probably stems from the safety guardrailsby reed1234
1/21/2026 at 6:46:07 PM
It's a fun setup that quickly devolves into the Shakespearian! The plots don't always work, but seeing their reasoning get increasingly complex is interesting."When that the poor have cried, Caesar hath wept. Ambition should be made of sterner stuff. Yet Brutus says he was ambitious... and Brutus is an honourable man.
by fancy_pantser
1/21/2026 at 2:45:07 AM
I made Mafia Arena as a way of measuring how good each LLM is at playing Mafia/WerewolvesThis is a good benchmark for how good AIs are at lying
by mohsen1
1/21/2026 at 8:32:16 AM
Something is off with the numbers. GPT-5.2 cannot have a 75% winrate with one win over GLM-4.7 and a 2/10 record against Gemmini 3 Flash.by littlestymaar