2/13/2026 at 2:12:52 PM
How did you have it testing its code changes? Did you tell it to use Playwright or agent-browser or anything like that?If coding agents can't test the code as they're editing it they're no different from pasting your entire codebase into ChatGPT and crossing your fingers.
At one point you mention it hadn't run "npm test" - did it run that once you directly told it to?
I start every one of my coding agent sessions with "run uv run pytest" purely to confirm that it can run the tests and seed the idea with it that tests exist and matter to me.
Your post ends with a screenshot showing you debating a C# syntax thing with the bot. I recommend telling it "write code that demonstrates if this works or not" in cases like that.
by simonw
2/13/2026 at 2:19:17 PM
If coding agents can't test the code as they're editing it they're no different from pasting your entire codebase into ChatGPT and crossing your fingers.
Out of curiosity, how do you get Claude Code or Codex to actually do this? I asked this question here before:
by aurareturn
2/13/2026 at 2:34:19 PM
I don't use CLAUDE.md, I instead use simple token-efficient conventions.Most importantly all of my Python projects use a pyproject.toml file with this pattern:
[dependency-groups]
dev = ["pytest"]
Which means I can tell the agent: Run "uv run pytest"
And it will run the tests - without first needing to setup a virtual environment or install dependencies or anything like that. I wrote more about that pattern here: https://til.simonwillison.net/uv/dependency-groupsFor more complex test suites I'll give it more detailed instructions.
For testing web apps I used to tell it "use playwright" or "use playwright Python".
I'm currently experimenting with my own simple CLI browser automation tool. This means I can tell it:
Run "uvx rodney --help" and then use
rodney to test this change
The --help output tells it everything it needs to use the tool - here's that document in the repo: https://github.com/simonw/rodney/blob/10b2a6c81f9f3fb36ce4d1...I've recently started having the bots "manually" test changes with a new tool I built called Showboat. It's less than a week old but it's so far been working really well: https://simonwillison.net/2026/Feb/10/showboat-and-rodney/
by simonw
2/13/2026 at 2:28:50 PM
Instruct it to test as it goes along. Add whatever testing base command to your list of trusted tools.by SJMG