4/29/2026 at 8:07:00 PM
Hey this is really cool! And your game is really inventive I’d love to try it when I’m home from work.Have you considered NOT using an LLM to test your game? Because your game is turn based and text based, could you separate rendering and logic entirely (you may have already done this by the sounds of it) and run a headless simulator that simulates thousands of games using a monte-Carlo type method? Is your game fully deterministic outside of player input?
Reason I ask is I’m making a game, it’s fully deterministic the only randomness is player input. But same inputs = same outputs from my traditional AI enemies.
With this in mind, I was able to completely separate rendering and game logic, and to tune my enemy AI (traditional AI not LLM) I can run millions of simulated games headless and generate reports of the games, and basically toggle AI parameters automatically each game until my AI is “perfect” for its archetype signature.
I can run tens to hundreds of games in parallel, and I can run a typical 5 minute game in seconds.
Then I can capture that game and recreate it and watch replays etc.
My game is also a browser game, but I built my own engine for it from scratch and no external libraries
by purple-leafy
4/30/2026 at 12:26:58 AM
Thank you. You have a great suggestion. I didn't do that, but I did consider it and I think it can be very powerful. I had 2 example use cases that having an actual AI felt good for, first validating a new feature based on the spec, and second, finding unexpected bugs (like trying to enter a locked room through the back wall). It didn't do so well on the latter, but did great on the former. Having a million simulated games could probably catch those, but how would you track the reports after? Perhaps using an LLM to read the logs/reports could be a good use. Your set up sounds awesome, nice work.by jschomay
4/30/2026 at 1:03:07 AM
You’re welcome :) for you I’d recommend try get 10 games running / simulated first, and manually analyse the reports yourself to see if the report data is useful. Try and get the report data into a useful shape, and have it as either a json array or an excel. Then you can feed it into an llm to analyse.For example for me my reports will basically be data points per AI archetype - like how often they collide with a wall, how often they perform certain actions, how often they get blocked or go idle. Straight numbers or booleans. This plus an ELO type system to rate the AI against one another so I can have an AI tier list. Then I can get an LLM to ingest the data and pick out issues / outliers etc.
My game is kinda like chess so this all makes sense for my game.
And thanks for the insights I will try a similar llm setup for manually playing my game, it’s definitely possible and it’s inspiring from your blog
by purple-leafy
4/30/2026 at 1:11:41 AM
This is the way to have a very tightly balanced game. I’ve seen people come up with a lot of sophisticated graphs and curves of various params and inputs that I personally don’t understand, but they tune things to values that naturally result in the kind of outcomes players will enjoy best. It would be impossible to just tweak all these variables and their interactions just through manual play tests alone.by deadbabe