5/2/2026 at 11:07:31 PM
I think it omits the real reason I want to run the harness in the sandbox: I barely trust the harness more than the LLM, at least at this point in time. They are so rapidly evolving along with the underlying models, that I don't think they are a reasonable component to rely on to provide safety constraints. Put more precisely: if your harness has an ability to do something the LLM can't, and it has a set of conditions under which the LLM can cause those to be invoked, you have to assume the LLM will work out those conditions and execute them. Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?). Longer term, I see it as a dedicated security layer, not part of the harness. This probably has yet to emerge fully but it's more like a hypervisor type layer that sits outside of everything and authorises access based on context, human user, etc and can apply policy including mediate the human intervention for decision points when needed.
by zmmmmm
5/3/2026 at 1:35:06 AM
I don't trust the harness, and I especially don't trust that the LLM won't be able to subvert the harness, or trick me via the harness. I assume that the LLM will be able to leak any secret in the harness context to arbitrary internet destinations, or somehow encode the secret in a work product. Eg space characters at the end of lines encoding access tokens.Having the harness in one VM, and tool use applied to user data in another, is about as safe as you can be at present. You can mount filesystem fragments from the data VM into the harness VM, but tool execution remains painful.
Having all authorisation and access control exist outside of the harness layer is essential. It should only have narrowly scoped and time limited credentials that are bound to its IP, and even then that is problematic.
by angry_octet
5/3/2026 at 3:03:49 PM
My approach to this has been a NixOS host with the harness running in a rootless podman sidecar.The host has squid configured with a self-signed CA and networking rules to route all host traffic to the intercepting proxy, so I have a tight firewall and full auditability.
Then there’s a python rpc daemon running on the host with a set of whitelisted commands, read-only for pulling logs and diagnostics.
By default, the agent runs in a split pane tmux session with a host shell on the left and the chat interface on the right. The rpc whitelist includes the proper `tmux capture-pane` invocation to pull from the host shell, so I can easily let it see what I’m doing if I want it to help debug something.
I’m using pi as my harness and have custom extensions that give Yes/No confirmation gates for any writes the agent makes and that pass all bash commands/file writes to a deepseek subagent for review.
Still early days, but as someone with a similarly paranoid mindset around running LLMs securely, I think the future is promising and we’ll see some new “best practices” and related tooling popping up shortly.
by wswope
5/3/2026 at 3:20:47 PM
NixOS is a great place to start from.Trusted observability will be key. Why am I giving the harness the ability to read/modify files when the harness lives in the same action space as tools? No, the gates should be controlled elsewhere, and even when I have given carte blanche, I want to see what has been done, step by step. So a controlled CA that allows for inspection of requests is great for logging.
by angry_octet
5/3/2026 at 5:39:23 PM
In this post, we built the harness, it’s not 3rd party (like Claude code in a sandbox). So we trust it as much as the rest of our backend code.by shad42
5/3/2026 at 10:34:20 PM
Probably insufficient to know that you wrote it, because code has bugs that LLMs and attackers are motivated to find. It has a higher trust requirement than most code.And of course, that trust only applies to you, no one else should trust your code absent other proofs.
by angry_octet
5/3/2026 at 1:05:52 AM
Author here.I should have made it more clear that the article is about agent / harness building (not about running third party agents).
> I barely trust the harness more than the LLM
Since we built it, I trust it just as much as I trust our API server :)
The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM
by aluzzardi
5/3/2026 at 1:42:30 PM
You have some very innovative thinking in your organization. Impressed.by bsenftner
5/3/2026 at 7:58:51 AM
> Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful."Lethal trifecta" is basically describing phishing but in a way more palatable to people who would rather die before allowing themselves to anthropomorphize LLMs even a little bit. It's not a problem you can fix with better coding, like some SQL injection. You can only manage risk around it (for which sandboxing is one of many solutions that can help).
So on one hand, I agree with you - you need to be mindful of what you're actually dealing with. On the other hand, you always have this, and need this, for the agent to be able to do anything useful.
by TeMPOraL
5/4/2026 at 4:28:14 AM
I wish it was just "phishing", but it's way worse.It's way more akin to a whole minefield of Zero-Click exploits.
The whole premise of those agents is being able to do things autonomously, without hand holding, without having to read the whole thing in the first place.
Phishing: active human steps on it and lose.
Lethal trifecta: mass landmines, in lots of places. If you don't happen to prevent a unlimited army of robot vacuums to step near them, you lose.
by ElectricalUnion
5/4/2026 at 6:05:18 AM
Less difference than you may expect.If you do anthropomorphise them like this, consider it from the PoV of a manager:
"My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"
Current AI are more gullible, for sure. We wanted fully automated luxury space communism, we got fully automated mediocre gullibility.
by ben_w
5/4/2026 at 8:44:58 AM
Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants.Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.
Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.
by fennecbutt
5/4/2026 at 10:56:13 AM
You can if you want, but all this stuff works in a similar way to as telling your staff "if someone calls saying they're the CFO and need a $25M transfer, check by a different channel": https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-ho...Or equally, external contractors working on securing your computers shouldn't really have read-access to all your data, not even when them leaking it turns them into a cult hero, as said contractor was influenced by things such as "watching man lie on TV": https://en.wikipedia.org/wiki/Edward_Snowden
The only thing which is different for agents rather than humans pertains to this:
> A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.
Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model. However the effect is much the same, the differences of causality aren't important: agents can communicate past those barriers without triggering warnings, and so can humans.
by ben_w
5/3/2026 at 1:33:26 PM
Phishing is only a subset of the issue, so I don't think that name's appropriate, besides being used for other things in other contexts (which would be another reason for me not to try and overload it).by 3form
5/3/2026 at 3:08:16 PM
I'm not saying we need to overload phasing, but rather to not treat the trifecta like a regular security vulnerability. As defined originally, the trifecta is analogous to phishing, but of course it's only a small subset of the issue.by TeMPOraL
5/3/2026 at 5:29:44 PM
I don't think I've read the original definition, what was it?by 3form
5/3/2026 at 4:12:11 AM
The LLM has harness control in claude ;) “Let me switch off the sandbox and try again”by gmerc
5/3/2026 at 9:43:45 AM
>Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?).I run a single-node k3d cluster on each of my MacBooks which uses Agent Sandbox[0] to keep harnesses isolated. Harnesses access models through LiteLLM only. I have aliases for `kubectl exec`ing into whatever harness I need.
by bauerd
5/3/2026 at 4:23:39 AM
> if your harness has an ability to do something the LLM can'tWhat does this even mean. The only capability of an LLM is generate text.
by tantalor
5/3/2026 at 8:02:35 AM
The LLM can only generate text. The harness can do more than just generate text. By joining the two you're allowing the LLM (through text) to carry out whatever actions the harness can take.My brain can only generate electrical signals. My hand responds to electrical signals and can interact with the real world. The two together can do more than just what my brain alone can do.
If you don't trust a particular brain, don't put a gun in the hand which is connected to it. If you don't trust a LLM, don't connect it to a harness which has access to your production database and only recent backups (https://www.theregister.com/2026/04/27/cursoropus_agent_snuf...).
by jbstack
5/3/2026 at 8:52:07 AM
We’ve trained models on JSON schemas for “tool calls”, and then built software to interpret and run those calls for the LLMsby girvo
5/3/2026 at 3:47:24 PM
> software to interpret and run those callsYes... That's the harness!
by tantalor