7/3/2026 at 8:48:29 PM
I'm using what I call "hermetic agents", where completely sandboxed agents write code and tests from the same specification, where the code writer can't see the test and the test writer can't see the code. The idea is that we can get better quality this way (by avoiding confirmation bias between code and tests). It is more painful to set up however, since you have to distill a spec and guides that the agent would normally hook into using RAG.It is more like people (agent?) management than coding though. I'm setting up and debugging processes, rather than writing code. I spend a lot of time cursing at and arguing with the agents I'm using to set up hermetic agents (who I can't argue with obviously, but I can have conventional agents go over their logs to figure out how to improve their sandboxed-context).
by seanmcdirmid
7/3/2026 at 11:51:57 PM
This is similar to how our college CS problem sets were graded. We were given a spec, and we had to implement a program that conformed to it. We had access to 70% of the test suite during development, and another 30% was hidden and only evaluated after submission. We were graded out of 100.It was effective at making you think about the problem and anticipate what tests might be missing. I can see how this would be effective for coding agents, which tend to get progressively lazier at writing tests as session context grows.
by chatmasta
7/4/2026 at 5:40:31 AM
The interesting twist here is that the test writer can also be wrong, the QA agent assigns blame when the implementation disagrees with a test case, and it’s not always going to be the implementation’s fault.Makes me wonder if there should be problems for CS students where given a spec and an interface, but not an actual implementation, they write the tests instead.
by seanmcdirmid
7/4/2026 at 5:31:23 PM
Had this issue at uni back in '08 (physics), my code did what was asked of it, but the auto grader didn't like it. Nagged the professor until he had a closer look and fixed the issue, and asked me to resubmit.by exe34
7/3/2026 at 8:54:57 PM
I did a thing sorta like this between app and infra. My app agent sends messages to the infra agent for what it needs and they go back and forth to sort things out. They invented their own working process on top of it and wrote their own tool. It's interesting watching all this work. Feels like the future more than a lot of things I've seen in the past few years. Shame we're doing this to the detriment of the environment and the economy. """
Claude Code mesh: gossip-based multi-instance coordination.
Usage:
mesh.py register --id ID --repo PATH --keyword KEYWORD
mesh.py list
mesh.py send --from ID --to ID MESSAGE
mesh.py broadcast --from ID MESSAGE
mesh.py watch --id ID
mesh.py forward --id ID MESSAGE_JSON
mesh.py peers --id ID [--n N]
mesh.py clean
"""
by leetrout
7/3/2026 at 8:59:19 PM
That's sort of crazy.My agents are completely locked down and their communication channels are limited. For example, a QA agent can give feedback to the coder or tester agents, but it can't reveal information to them that they shouldn't know about.
It also prevents them from going off the rails, since they can't really do anything outside of their specified tasks (tools are locked down as well).
by seanmcdirmid
7/3/2026 at 9:34:34 PM
Have a look at GitHub.com/Strapchay/protag. I have a similar concept implemented. It’s still a work in progress and I haven’t worked on it for a while but the context is that they can only make modification through a cli tool created for the project which also restricts the context of the files they can look through and each domain agent is assigned specific files and can’t modify beyond it. Still a lot of work to be done thoughby strapchay
7/3/2026 at 11:41:21 PM
Literally converged to this same pattern over this month.by pkoird
7/4/2026 at 12:15:58 AM
do you mind sharing the more specific setup or agent framework? Hermes? LangChain? DIY?by ernsheong
7/4/2026 at 5:44:09 AM
I work at Google, so it’s antigravity agents using Blaze (like Bazel) to configure their context, including DAG-style dependencies on other hermetic agent outputs, what specific tool calls they can make, and their communication channels (they can send feedback to agents backwards on the DAG). Otherwise this is completely DIY (antigravity does a lot of the heavy lifting, so nothing fancy accept for custom sandboxing).by seanmcdirmid
7/3/2026 at 9:24:29 PM
Doing similar stuff. I curse a lot. The processes do transfer mostly between different models which a bit validates the approach.by kosolam
7/4/2026 at 7:37:19 AM
[flagged]by verify-ai