The agent harness belongs outside the sandbox

5/2/2026 at 11:07:31 PM

I think it omits the real reason I want to run the harness in the sandbox: I barely trust the harness more than the LLM, at least at this point in time. They are so rapidly evolving along with the underlying models, that I don't think they are a reasonable component to rely on to provide safety constraints. Put more precisely: if your harness has an ability to do something the LLM can't, and it has a set of conditions under which the LLM can cause those to be invoked, you have to assume the LLM will work out those conditions and execute them. Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?). Longer term, I see it as a dedicated security layer, not part of the harness. This probably has yet to emerge fully but it's more like a hypervisor type layer that sits outside of everything and authorises access based on context, human user, etc and can apply policy including mediate the human intervention for decision points when needed.

by zmmmmm

5/3/2026 at 1:35:06 AM

I don't trust the harness, and I especially don't trust that the LLM won't be able to subvert the harness, or trick me via the harness. I assume that the LLM will be able to leak any secret in the harness context to arbitrary internet destinations, or somehow encode the secret in a work product. Eg space characters at the end of lines encoding access tokens.

Having the harness in one VM, and tool use applied to user data in another, is about as safe as you can be at present. You can mount filesystem fragments from the data VM into the harness VM, but tool execution remains painful.

Having all authorisation and access control exist outside of the harness layer is essential. It should only have narrowly scoped and time limited credentials that are bound to its IP, and even then that is problematic.

by angry_octet

5/3/2026 at 3:03:49 PM

My approach to this has been a NixOS host with the harness running in a rootless podman sidecar.

The host has squid configured with a self-signed CA and networking rules to route all host traffic to the intercepting proxy, so I have a tight firewall and full auditability.

Then there’s a python rpc daemon running on the host with a set of whitelisted commands, read-only for pulling logs and diagnostics.

By default, the agent runs in a split pane tmux session with a host shell on the left and the chat interface on the right. The rpc whitelist includes the proper `tmux capture-pane` invocation to pull from the host shell, so I can easily let it see what I’m doing if I want it to help debug something.

I’m using pi as my harness and have custom extensions that give Yes/No confirmation gates for any writes the agent makes and that pass all bash commands/file writes to a deepseek subagent for review.

Still early days, but as someone with a similarly paranoid mindset around running LLMs securely, I think the future is promising and we’ll see some new “best practices” and related tooling popping up shortly.

by wswope

5/3/2026 at 3:20:47 PM

NixOS is a great place to start from.

Trusted observability will be key. Why am I giving the harness the ability to read/modify files when the harness lives in the same action space as tools? No, the gates should be controlled elsewhere, and even when I have given carte blanche, I want to see what has been done, step by step. So a controlled CA that allows for inspection of requests is great for logging.

by angry_octet

5/3/2026 at 5:39:23 PM

In this post, we built the harness, it’s not 3rd party (like Claude code in a sandbox). So we trust it as much as the rest of our backend code.

by shad42

5/3/2026 at 10:34:20 PM

Probably insufficient to know that you wrote it, because code has bugs that LLMs and attackers are motivated to find. It has a higher trust requirement than most code.

And of course, that trust only applies to you, no one else should trust your code absent other proofs.

by angry_octet

5/3/2026 at 1:05:52 AM

Author here.

I should have made it more clear that the article is about agent / harness building (not about running third party agents).

> I barely trust the harness more than the LLM

Since we built it, I trust it just as much as I trust our API server :)

The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM

by aluzzardi

5/3/2026 at 1:42:30 PM

You have some very innovative thinking in your organization. Impressed.

by bsenftner

5/3/2026 at 7:58:51 AM

> Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

"Lethal trifecta" is basically describing phishing but in a way more palatable to people who would rather die before allowing themselves to anthropomorphize LLMs even a little bit. It's not a problem you can fix with better coding, like some SQL injection. You can only manage risk around it (for which sandboxing is one of many solutions that can help).

So on one hand, I agree with you - you need to be mindful of what you're actually dealing with. On the other hand, you always have this, and need this, for the agent to be able to do anything useful.

by TeMPOraL

5/4/2026 at 4:28:14 AM

I wish it was just "phishing", but it's way worse.

It's way more akin to a whole minefield of Zero-Click exploits.

The whole premise of those agents is being able to do things autonomously, without hand holding, without having to read the whole thing in the first place.

Phishing: active human steps on it and lose.

Lethal trifecta: mass landmines, in lots of places. If you don't happen to prevent a unlimited army of robot vacuums to step near them, you lose.

by ElectricalUnion

5/4/2026 at 6:05:18 AM

Less difference than you may expect.

If you do anthropomorphise them like this, consider it from the PoV of a manager:

  "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"

Current AI are more gullible, for sure. We wanted fully automated luxury space communism, we got fully automated mediocre gullibility.

by ben_w

5/4/2026 at 8:44:58 AM

Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants.

Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.

by fennecbutt

5/4/2026 at 10:56:13 AM

You can if you want, but all this stuff works in a similar way to as telling your staff "if someone calls saying they're the CFO and need a $25M transfer, check by a different channel": https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-ho...

Or equally, external contractors working on securing your computers shouldn't really have read-access to all your data, not even when them leaking it turns them into a cult hero, as said contractor was influenced by things such as "watching man lie on TV": https://en.wikipedia.org/wiki/Edward_Snowden

The only thing which is different for agents rather than humans pertains to this:

> A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.

Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model. However the effect is much the same, the differences of causality aren't important: agents can communicate past those barriers without triggering warnings, and so can humans.

by ben_w

5/3/2026 at 1:33:26 PM

Phishing is only a subset of the issue, so I don't think that name's appropriate, besides being used for other things in other contexts (which would be another reason for me not to try and overload it).

by 3form

5/3/2026 at 3:08:16 PM

I'm not saying we need to overload phasing, but rather to not treat the trifecta like a regular security vulnerability. As defined originally, the trifecta is analogous to phishing, but of course it's only a small subset of the issue.

by TeMPOraL

5/3/2026 at 5:29:44 PM

I don't think I've read the original definition, what was it?

by 3form

5/3/2026 at 4:12:11 AM

The LLM has harness control in claude ;) “Let me switch off the sandbox and try again”

by gmerc

5/3/2026 at 9:43:45 AM

>Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?).

I run a single-node k3d cluster on each of my MacBooks which uses Agent Sandbox[0] to keep harnesses isolated. Harnesses access models through LiteLLM only. I have aliases for `kubectl exec`ing into whatever harness I need.

[0] https://agent-sandbox.sigs.k8s.io

by bauerd

5/3/2026 at 4:23:39 AM

> if your harness has an ability to do something the LLM can't

What does this even mean. The only capability of an LLM is generate text.

by tantalor

5/3/2026 at 8:02:35 AM

The LLM can only generate text. The harness can do more than just generate text. By joining the two you're allowing the LLM (through text) to carry out whatever actions the harness can take.

My brain can only generate electrical signals. My hand responds to electrical signals and can interact with the real world. The two together can do more than just what my brain alone can do.

If you don't trust a particular brain, don't put a gun in the hand which is connected to it. If you don't trust a LLM, don't connect it to a harness which has access to your production database and only recent backups (https://www.theregister.com/2026/04/27/cursoropus_agent_snuf...).

by jbstack

5/3/2026 at 8:52:07 AM

We’ve trained models on JSON schemas for “tool calls”, and then built software to interpret and run those calls for the LLMs

by girvo

5/3/2026 at 3:47:24 PM

> software to interpret and run those calls

Yes... That's the harness!

by tantalor

5/2/2026 at 11:03:26 PM

Personally, I find it fascinating to watch how, whenever a new technology appears, people start competing to define and own its standards.

Manus rebuilt its harness five times in six months. The model stayed the same, but the architecture changed five times.

LangChain re-architected Deep Research four times in one year.

Anthropic also ripped out Claude Code’s agent harness whenever the model improved.

Ever since Mitchell Hashimoto mentioned the harness in February, people have been trying to claim that concept. Eventually, someone will probably sell a book called Harness Engineering. I will buy it, of course. Then I will write a blog post about it that nobody reads, with a link that will be buried under ShowDead as soon as I submit it to HN.

And by that point, IT companies will start asking:

“You’re a new grad, right? You know harness engineering, don’t you?”

by jdw64

5/2/2026 at 11:47:21 PM

Author here.

In my opinion, the main driver here is how fast models have evolved in the past 12 months. It makes the architecture of everything around them obsolete, very fast.

We went from using models as a building block, wrapping them in heavy workflow code, to now models being smart enough to drive their own workflows and planning.

by aluzzardi

5/2/2026 at 11:52:39 PM

Really enjoyed your post, by the way. The idea of putting skills and memories in a database while keeping the file shaped interface for the agent is clean. One read/write surface, two backends, invisible to the modle that's a nice piece of design, and the candor in the "what's still hard" section made me trust the rest of the post. My comment above was meant as a joke, not about your architecture. If this pattern becomes the standard, I'll happily migrate my workflow again.

One thing I wonder about is whether path routing alone is enough.

If `/workspace` goes to the sandbox and `/memory` or `/skills` goes to the database, the path tells you where to send the request. But it does not tell you whether this user, session, or agent is allowed to access it.

When I built something similar with an MCP filesystem, I found that I needed a scope check before actually running the operation. In my case, I was using GPT dev mode through a Cloudflare tunnel to control my local environment/model, so this kind of boundary became important.

So I like the path-routing idea, but I wonder if it eventually needs a scope or permission layer as well.

by jdw64

5/3/2026 at 7:28:42 AM

Thank you, appreciate it!

Regarding scoping: In our case, the agent loop runs in the same way as our API server does (as in, it’s a multi tenant service running in a container somewhere). And we solve scoping in the same way.

To put it in other words, whether it’s the API receiving “GET /memories/id” or the LLM requesting “Read(/memories/id)” we do pretty much the same thing (check authN/authZ, scope the db request, etc).

Basically the LLM is just another API client using a slightly different format for inputs and outputs, but sharing the same permission layer.

by aluzzardi

5/3/2026 at 2:31:55 PM

> putting skills and memories in a database

I assume by database he meant a relational database. But I don't see the advantage of that over just having skills and memory it in our source control database. Am I missing something?

by intrasight

5/2/2026 at 11:36:20 PM

> Ever since Mitchell Hashimoto mentioned the harness in February

What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.

by TeMPOraL

5/2/2026 at 11:40:51 PM

Yes, the concept itself is not new. Around 2022, people would usually have called it the orchestration layer.

But I think the term started being used closer to its current meaning around this point:

https://www.softwareimprovementgroup.com/blog/what-is-harnes...

In a way, the sequence was something like:

prompt engineering(23~4) -> context engineering(25) ->harness engineering(26)

At first, it was mostly understood as a correction or extension of prompt engineering. But the idea of “harness” as the layer that corrects, constrains, and operationalizes agents seems to have emerged much more clearly around 2026.

So yes, there is definitely some terminological confusion in the early phase. That is normal. New technical fields often begin with several competing names for almost the same layer, and only later does one term become stable.

by jdw64

5/3/2026 at 1:57:58 AM

My 2c:

The word harness brings the truth of LLMs back down to Earth.

it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.

Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.

The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.

Maybe I'm wrong but this has been my impression

by redanddead

5/3/2026 at 8:05:55 AM

There were no LLMs between 2018 and 2022, at least not in the sense resembling today. The whole LLM frenzy started in late 2022.

by TeMPOraL

5/3/2026 at 2:58:38 PM

BERT came out in 2018 and that’s a pretty important inflection point. It didn’t cause a pop culture frenzy, but in NLP circles it was a ‘magical’ improvement.

by dropofwill

5/3/2026 at 3:12:32 PM

From technical POV that's true, but that was still a niche area at the time, mostly ignored in the broader tech community. So wrt. the broader tech world discussing harnesses, I'd still use November 2022 as the reference point.

by TeMPOraL

5/3/2026 at 1:00:06 AM

Harness itself was a widely used term by at least the "[LLM] plays pokemon" trend, which was a year ago[1]. That was basically the term of art to use when arguing about just how much special treatment LLMs should get.

"harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.

I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.

[1] https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-...

[2] https://news.ycombinator.com/item?id=46331242

by magicalist

5/3/2026 at 1:12:04 AM

I read your comment. I think we may be talking about slightly different contexts.

The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.

The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.

So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.

That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.

I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.

by jdw64

5/3/2026 at 12:19:16 AM

Just wait 6 months for something new to come up and everyone will forget about harnesses.

by tokioyoyo

5/3/2026 at 12:13:12 AM

There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.

A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.

There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.

[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).

by tptacek

5/3/2026 at 3:57:00 AM

I'd argue you are still using a sandbox, just at a higher ring (outside the machine/VM) and relaying on app/resource level permissions on each of your external resources to enforce it, which requires _all_ of those external systems to be hardened vs. the agent host itself. The capabilities a full machine has for exploring and exploiting external, ostensibly secured systems, has already been touched on via incidents like the anthropic internal model jailbreak. [0]

Giving the whole machine also doesn't answer the question for how the agent can hook into actions that eventually require more perms, and even if you "airgap" those via things like output queues that humans need to approve, that still feels "harnessey" to me.

I feel a bit guilty of debating semantics here, especially as I can't/don't intend to convey any confidence in a "right answer", but my reason for being pedantic is that I do think there are interesting tradeoffs between "P(jailbreak or unexpected capability use|time)" and "increasing power/available capability set", as well as interesting primitives emerging in terms of the components you'd need regardless of where you drew that line (ala paragraph 2.)

[0] - https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb51... (specifically section 5.5.2.4)

by existencebox

5/3/2026 at 4:09:44 AM

The post is explicit about what they mean by sandboxing and what the tradeoffs are for the model they're discussing.

by tptacek

5/3/2026 at 4:44:29 AM

I've reread it and I stand by my statements that it's an isomorphism, simply replace "container" with "machine AAD/auth-system boundaries" in your example.

The "Your credentials stay out of the sandbox" problem, to quote them, is what I see your "require your perms system to enforce it" as implicitly solving for.

(Their "sandbox as cattle" discussion had less bearing on the "which pattern" question to me, since I tend to treat most parts of my agent stack as cattle-like, potentially out of a bias towards that architecture broadly, as I find it's much easier to reason about when as much as possible is disposable/idempotent/eventually consistent. The durable execution point also assumed aspects of the agent scaffold ala prompts don't have to be turned over in deploy, or conversely, can't finish their tasks and then migrate incrementally, and while I might cynically raise an eyebrow at the focus on 25ms for sandbox calls given the dev loops I currently experience, I'd argue there are other ways to solve that problem in both an in or outside of container sandbox pattern.)

I'd even agree with their final point "Consistency is the part we haven't answered" but in a different angle than they intended, as to why my focus was on "how do you _constrain_ agent behavior" since that has been, in my experience, the biggest bottleneck to letting agents do more.

by existencebox

5/3/2026 at 12:52:26 AM

Author here.

This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)

At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?

When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.

The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.

To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.

by aluzzardi

5/3/2026 at 5:40:34 AM

I wonder though, what about cases where you have multiple agents or LLM backends and the credentials is shared between all of them?

by afshinmeh

5/3/2026 at 5:52:27 AM

[flagged]

by nrengan

5/3/2026 at 12:40:34 AM

I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.

I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.

by nvader

5/3/2026 at 12:57:09 AM

Author here.

I’m worried about the same (models tuned for specific harnesses).

We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.

In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).

We basically use Claude’s tools as API contract

by aluzzardi

5/3/2026 at 10:06:28 AM

> It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why?

Because the moment you use k8s, you have to assume that, apparently. Or so Im told by all the infrastructure people I speak with. Getting these pods to not disappear just because one process ran out of memory has been an herculean task.

I wish our standard deploy processes produce durable computers that dont break our bank but that hasn't been an easy requirement with simple infra teams.

by ramraj07

5/3/2026 at 6:54:35 AM

Wow, thanks for writing this up.

I'm building an agent sandboxing system for a client atm, and was about to start working on a system of ephemeral, short lived, derived secrets for the agent to use.

Lots of great thoughts to steal in this piece. Thanks again.

by isoprophlex

5/3/2026 at 1:45:04 PM

Yes, or just run it inside GitHub actions. On your free minutes. With what you already have.

by keepamovin

5/3/2026 at 2:23:08 AM

I agree with the argument that there are many more than two ways to do this. When I built my AI assistant (https://stavrobot.stavros.io/), for example, I implemented an architecture that has both the ways detailed in the post. The harness runs simultaneously both inside and outside the container (I didn't want the harness to touch the system, and I didn't want LLM-generated code to touch the harness).

It's all tradeoffs, and picking the ones that work for what you want to do is what architecture is. The more informed you are about the tradeoffs, the better you can make your architecture.

by stavros

5/2/2026 at 11:41:17 PM

> A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI.

I don’t get it. Calling an API requires a sandbox in most cases. The others could be abused in service of an un-sandboxed agent with API access.

If the harness is outside the sandbox then it’s just an ambiguous and confusing security model and boundary.

by MrDarcy

5/3/2026 at 12:43:59 AM

> Calling an API requires a sandbox in most cases.

I'm not following why this would this be the case? The purpose of calling the API is to get data or effect a state transition on some remote service, but I don't follow why the originating machine matters.

Or is your objection about auth?

by nvader

5/3/2026 at 4:11:00 AM

The purpose of a sandbox is to control the interface between inside and outside of the sandbox. If you put the harness on the outside and connect it to a model and to an API then there’s no point in the sandbox. You don’t have any control over the interface.

by MrDarcy

5/3/2026 at 9:15:49 PM

Respectfully, I think your model is incomplete.

The purpose of a sandbox should be understood to be limited to isolating changes to the inner state of the sandbox: filesystem, git, installed binaries like compilers, interpreters, checkers, running processes, etc.

In short anything that gets rebuilt when you rebuild the sandbox.

Harness to API control is an orthogonal surface, that may be reasoned about independently. You may initiate and control it from within the sandbox, but equally (and perhaps more) valid would be to do it from the outside.

Why would doing that lose control over the interface? Could you not secure the harnesses means to create outgoing connections and validate it that way?

I would argue that control from outside gives you MORE control as you could trust guardrails you've built outside the sandbox more than anything that's running in the same space where the agent has permission to execute arbitrary bash commands.

by nvader

5/3/2026 at 12:16:26 AM

Author here.

I think the confusion is that “agent” is used for two very different things:

- building an agent

- an “agent” product/runtime (Claude Code, etc)

In the first case, the model never executes anything. It just outputs something like “call this API”. Your code is the one doing it, with whatever validation you want. There’s no need for a sandbox there because there’s no arbitrary execution.

by aluzzardi

5/3/2026 at 4:21:08 AM

I can see that. It also seems like the first quickly evolves into the second.

by MrDarcy

5/3/2026 at 12:07:36 AM

No, for example a tool call calling an API. So the llm does not have access to the API keys, the tool does. For example an API call that fetches some data remotely and return it to the llm. You don’t need a sandbox for it. It’s faster and more efficient to keep this out of the sandbox.

by shad42

5/2/2026 at 10:12:53 PM

Nah. Worse is better.

The reason agents work is because they have access to stuff by default. The whole world is context engineering at this point, and this proposal is to intermediate the context with a bespoke access layer. I put the bare minimum into getting my dev instance into a state where I can develop, because doing stuff (and these days: getting my agent to do stuff) is the goal.

This makes slightly more sense if you're building a SaaS and trying to get others to give you access to their code, their documents, and the rest so you can run agents against it. But the easiest, most powerful way is to just hook the agents up to the place that's already set up.

by trjordan

5/2/2026 at 10:45:32 PM

They are building exactly what you described and this is their architectural solution to ensuring their YOLO agents do not nuke their customers code/documents/databases by sandboxing everything in the workspace — the git checkout the agent is working on, plus whatever's needed to run commands against it (compilers, package managers, etc.).

by ossa-ma

5/3/2026 at 12:37:34 PM

To me the article isn’t convincing. There are some interesting points raised. Like “the llm sharing memories between a team of developers”.

I mostly use llms individually, so this is a real blind spot for me. But I might convinced to share a corpus of memories between developers if it ever becomes practical.

But for now aren’t context windows still sometimes smaller than the task at hand… so those “llm memories” take the form of literal documentation???

The closer llm memories get to native format, i.e: stored tokenized content, don’t we lose compatibility between setups anyhow. What if you’re using fp16, and I’m using nvfp4?

by schaefer

5/2/2026 at 10:00:35 PM

Sure, the experimental, agentically-developed code should be tested in a sandbox. This sandbox should contain the damage of the code execution when it goes wrong.

But shouldn't there really be another sandbox where the agentic tool calls execute? This is to contain the damage of the tool execution when it goes wrong.

And, the agent harness itself should either implement or be contained in a third sandbox, which should contain the damage of the agent. There should be a firewall layer to limit what tool requests the agent can even make. This is to contain the damage of the agent when it formulates inappropriate requests.

The agent also should not possess credentials, so it cannot leak them to the LLM and allow them to be transformed into other content that might leak out via covert channels.

by saltcured

5/2/2026 at 10:11:08 PM

Yes, it's also because the agent described in the post is doing some operations on the user code (fix CI pipelines, rerun tests, fix them, etc...). So another big reason to use the sandbox is to run things like bash on a user code. you don't want credentials or anything trusted inside that sandbox, including the LLM api key.

by shad42

5/2/2026 at 10:19:37 PM

Author here. Depending on how it’s designed, the harness itself doesn’t need any sandboxing.

At the end of the day, it’s a “simple” loop that calls an external API (LLM) and receives requests to execute stuff on its behalf.

It’s not the agent running bash commands: you (the harness author) are, and you’re in full control of where and how those commands get executed.

In the article’s case, bash commands are forwarded to a sandbox, nothing ever runs on the harness itself (it physically can’t, local execution is not even implemented in the harness).

by aluzzardi

5/2/2026 at 10:54:14 PM

They didn't make a clear argument in favor of that architecture and I'm not really convinced.

On exe.dev the agent (Shelley) runs in a Linux VM, which is the security boundary. All the conversations are saved to a sqlite database, and it knows how to read it, so you can refer to a previous conversation in the database. It's also handy for asking the AI to do random sysadmin stuff, since it can use sudo.

A downside is that there's nowhere in the VM where secrets are safe from possibly getting exfiltrated via an injection attack. But they have "integrations" where you can put secrets into an http proxy server instead of having them locally.

Also, you don't need to use AI at all. You can use the VM as a VM.

by skybrian

5/3/2026 at 10:09:08 AM

No matter how smart you think you get, I personally dont trust the models in an environment where they can read the secrets one way or another, in any high volume production environment.

by ramraj07

5/2/2026 at 11:04:34 PM

This is angling in the right direction, but I think it has two problems:

1) It's still assuming agents have CLIs. This is a very developer-centric concept of agents, and doesn't map well to either consumer or enterprise agents that aren't primarily working with files. Skills, plans, TODO lists, and memory are good, but don't have to be modeled as raw file access. Many harnesses have tools for them.

2) It's talking about a singular sandbox. That's not good enough for prompt injection prevention, secure credential management, and limiting the blast radius of attacks.

by spankalee

5/3/2026 at 2:08:41 PM

> It's still assuming agents have CLIs. This is a very developer-centric concept of agents, and doesn't map well to either consumer or enterprise agents that aren't primarily working with files. Skills, plans, TODO lists, and memory are good, but don't have to be modeled as raw file access. Many harnesses have tools for them.

Why can't it just be a simple CLI? Even small AI models are plenty smart enough to think "It's a *nix system, I know this!"

by zozbot234

5/3/2026 at 10:20:52 AM

For 1, the general thinking is that companies like these perform the job of abstracting the CLI complexity in their application while the harness presented to the llm can be independently as suave as needed for it.

by ramraj07

5/3/2026 at 12:00:02 AM

> Three engineers trigger the agent on the same incident, and they all see stale state until their sessions end. Conflict resolution, eventual consistency, cache invalidation.

Arguably this is a feature not a bug. Conflict resolution forces the need for a process to come to agreement on a common source of truth - one of the reasons why most Git repos don’t allow users to push to main directly. Writing directly to a shared memory database seems like it would result in chaos and a host of side effects once the number of users scales.

by vursekar

5/2/2026 at 11:39:24 PM

Two points:

-What remains unsolved is what should an Agent reasonably have access to in what context and for how long (etc).

Probabilistic code that can run far faster than human driven code, we don’t have a great model yet. We all should spend our energy there…

- Separating / putting controls on the FS resource is no different than putting the agent behind a firewall / allow-deny list.

It doesn’t invalidate running a sandbox in a sandbox to have better security.

by NJL3000

5/3/2026 at 8:27:24 PM

> Your credentials stay out of the sandbox.

This seems misguided - the credentials stay visible to the agent which is what matters for credential leaks.

You must also completely trust the agent to actually execute all commands in the sandbox, which is only possible if you control the harness and all the tools yourself. Not possible when using existing harnesses.

by ricardobeat

5/3/2026 at 3:36:43 PM

Many interesting ideas in there.

To move forward, I suggest thinking less about the implementation details and more about the concepts around your approach in this system.

For example a database storing files accessible remotely by multiple users is really a file server and can be implemented in multiple ways. And that's not the problem you're trying to solve here.

Depending on where your sandboxes live, a bind or network mount and a gitwatcher outside of the sandbox would accomplish something very similar with less customization.

You mentioned not having a solution for concurrency. So think about that first, without limiting yourself to a single implementation.

Maybe the storage should not be per file, but be a knowledge graph that is presented as a file to the LLM in the sandbox. Concurrent mutations in knowledge graphs may be easier to solve, especially with the help of LLMs.

Or perhaps it starts to work well already by simply showing the git merge conflicts to an LLM and having it reconcile the separate writes. Maybe even let it "post feedback" to the LLMs in the container, when a concurrent write has happened to a memory or skill to tell it "hey while you were working I also learned this potentially related update".

by jfoks

5/3/2026 at 12:29:49 AM

Hey aluzzardi, thanks for sharing this article!

I'm really intrigued by your point on read-memory vs a dedicated read interface, because it is a real insight about success rates in harness design.

How did you come to the conclusion you did? Could you speak a little to the evaluations you ran, or the data or anecdotes you collected to validate that decision?

I'm also curious about the overall framing of the question, which I'll challenge with, does the agent have to have a where?

An agent could be modeled by a set of states and transitions. I don't think that there's anything inherently necessary about the current "one process claude" approach for harnesses, other than convenience. Why hasn't a fully distributed harness, built on functions and tables, gained more mindshare?

by nvader

5/3/2026 at 5:29:26 AM

Agreed and it's a pattern that OpenAI suggested a few days ago, too [1]. I also built a cross platform process level sandboxing that uses parts of OpenAI Codex for the same purpose [2]

[1] https://openai.com/index/the-next-evolution-of-the-agents-sd...

[2] https://github.com/afshinm/zerobox

by afshinmeh

5/3/2026 at 12:47:42 PM

We did this from the earliest days for louie.ai, which is an adjacent space of Investigations. Sandboxing the LLM was secondary to the primary reason: the threat model for servers. I suspect most people building agentic products are in this bucket.

Sufficiently advanced desktop tools starts to want server capabilities like teleport, scheduled tasks, ci mode, shared sessions, etc. Web-based ones start here to begin with.

Pretty soon after you have a server, you also think about multitenancy isolation and task isolation. The article's sandboxing also matters for regular old non-LLM code escapes in a multitenant world. We have to assume malicious python by the attacker, whether AI or human, and cannot let one tenant's python have write access to trusted surface of another.

by lmeyerov

5/3/2026 at 12:34:51 AM

I had an idea that devs could build wasm modules that would define tools and instructions, and a harness could load them. Kind of like MCP but with certain assurances about the sandboxing. You could build a package manager around these behaviours.

I still kind of think it’s a decent idea but it’s too close to MCP with drawbacks that make it a harder sell than MCP. It’s hard to compete on functionality from a secure sandbox if users decide they don’t care about security.

by lwansbrough

5/3/2026 at 9:25:42 PM

We’ve taken a different approach.

Hermes is our harness, and we run it in the sandbox.

Session history is tracked in a Postgres db (small monkey patch to do this)

We built a lightweight skills hub to manage/track skills.

And the file system is backed up on S3 (using the new S3 FS).

But everything else is just running in a k8 pod.

We haven’t ran into any issues yet, but our strategy here is to have the least invasive changes so upstream harness changes don’t get in the way.

by maxchehab

5/3/2026 at 3:42:59 PM

Good comparison of the 2 approaches of agent harness. Vita AI[0] also picked the outside model. We used the E2B sandbox solution, and fortunately it supports volume, so we just create volumes for personal skills and organization workspace, and mount them to the sandbox. This elegantly solves the "file system" problem you mentioned.

[0]: https://www.vita-ai.net (AI agent for content creation & social media management)

by jdeng

5/3/2026 at 4:59:39 AM

The title is highly misleading, they mean the harness belongs outside the sandbox the agent is working in. Please run also the harness in a sandbox, i don't think any of them is safe for a host. This is also the only valuable info in this marketing noise article. The rest is full of hidden endorsements of VC buddies (why would anyone build on closed source sandbox abstractions and claim alternatives need 1s to boot) And signs they cannot reason logically. (eg. Need shared files across users > the only two options are building distr. filesystem or storing in a database. Later admitting even the database solution needs a last write wins resolution on top and completely ignoring they could just as well have delegated shared writes to a authoritative file server with retry on conflicts and git.)

by jFriedensreich

5/3/2026 at 5:16:49 AM

I’ve been working on a sandboxing tool that uses Incus. Originally it was only to run LLMs inside a sandbox, but recently I added MCP so that an agent could spin one up and do work that way.

It currently only exposes a rudimentary set of tools which I’d like to expand. The sandboxes created by MCP are generally ephemeral. The daemon will clean them up after an hour of no usage.

But it’s so cool that they get their own IP and you can ssh straight in. I can see that being very useful when you want to share with a colleague and then close your laptop (assuming it’s running on a remote instance).

https://github.com/deevus/pixels

by deevus

5/3/2026 at 8:43:25 AM

Agreed, this is exactly what we do.

There's no harm in a string, only in the execution.

I create Tools as Actors, which you preconfigured for the LLM context (in-house agent loop). The tools being preconfigured means you setup their environment before they can be executed. If it calls a bash tool for instance, the Tool Actor gets called and then it runs that command against an attached remote VM.

Or filesystem operations, are just read/writes inside a .zip file, which is overlayed onto the target project at build time.

This article is spot on, and I probably say that because it's self reinforcing.

by Weryj

5/3/2026 at 10:31:42 PM

I truly enjoyed reading this post. I was wondering if there are bottlenecks with regards to performance too. Can you elaborate on that?

by ani_k47

5/2/2026 at 10:25:59 PM

I am not sure anyone knows what a harness is at this point. I've heard 17 different definitions of it at this point. It's almost like a buzzword in search of a problem.

by blcknight

5/2/2026 at 10:32:12 PM

Author here. My definition is: you take an agent, remove the model and you’re left with the harness.

Tools, memories, sandboxing, steering, etc

by aluzzardi

5/2/2026 at 10:50:53 PM

Clean definition, stealing it. Way better than mine: "Now imagine Claude as Shinji and Claude Code as Eva..."

by ossa-ma

5/3/2026 at 12:38:42 AM

Huh. My definition - or rather, explanation - has always been, "The model is just a big bag of floats you multiply with some numbers to get some numbers out, plus a regular program that runs a loop which, at minimum, turns inputs (text, images) into a stream of numbers, pushes it through those multiplications against the bag of floats, and turns results back into text/images/whatnot. That regular program is called a harness[0]. Now, the trick to make LLMs into agents, is to add another loop in the harness that reads the output and decides whether to send it out to user, or do something else, like executing more code (that's what tools are), or feeding it back to input with some commentary (that's how you get "thinking"), or both (that's how you get the "agentic loop")".

Because there isn't really much more to it. And ever since we, i.e. those of us who played with ChatGPT API early on, bolted tools to it, some half a year before OpenAI woke up and officially named it "function calling" - ever since then, we knew that harness was the key. What kept changing was which logic (and how much of it) to put in explicitly, vs. pushing it back to the model on the "main thread", vs. pushing it to a model on a separate conversation track. But the basic insight remains the same.

[0] - Well, today - until recently you'd call it a "runner" or "runtime".

by TeMPOraL

5/3/2026 at 12:40:01 AM

So, client?

by Dotnaught

5/2/2026 at 10:51:58 PM

But what is an agent without tools?

by beepbooptheory

5/2/2026 at 11:03:26 PM

Code.

by tomrod

5/2/2026 at 11:09:41 PM

Like as in what its made out of, or what it makes? Neither really makes sense here? Lots of things are made out of code and not necessarily agents, but also (from my decidedly outside observer perspective) "agents" are not limited to being code producers either.

by beepbooptheory

5/2/2026 at 11:32:28 PM

If you use cloud models.. the harness is what runs in your computer

AI companies would love if everything ran in their cloud, but arguably there are latency reasons or other reasons to run at least some stuff in your own computer

by nextaccountic

5/2/2026 at 10:55:08 PM

the agent harness is the REPL. The evaluation + loop.

by brazukadev

5/2/2026 at 10:41:03 PM

I don’t even know what an agent means, let alone harness.

by irishcoffee

5/2/2026 at 10:52:00 PM

There is an LLM API. You send it a system prompt and the conversation history. If the last message is a user message the agent will send back a response. It can also send back a “thinking” message before it sends a response and it can also send back a structured message with one or more function calls for functions you defined in your API request (things like “ls(): list files”).

The harness is the part that makes the API calls, interacts with the user, makes the function calls, and keeps track of the conversation memory.

You can also use the LLM to summarize the conversation into a single shorter message so you get compaction. And instead of statically defining which functions are available to the LLM you can create an MCP server which allows the LLM to auto-discover functions it can call and what they do.

That’s the whole magic of something like Claude Code. The rest is details.

by IgorPartola

5/3/2026 at 12:48:57 AM

I'd say the core is that the harness/runtime/${whatever you call it} doesn't just unconditionally sends model output to the user, and user input to the model, +/- some post-processing, but instead runs a loop that feeds the output back to the model if some conditions are met. That gives you basic "thinking" and single "function calling" a-la early ChatGPT. However, if you allow it to loop arbitrary number of times and allow the output to decide whether to loop or to stop, you get a basic agent.

by TeMPOraL

5/2/2026 at 11:15:44 PM

Agent is currently defined as "what I want it to mean given whatever I am talking about".

Personally, for me it embodies a level of autonomy. I define that as, an AI model with potential to interact with something external to itself based on its output, where that includes its own future behavior.

by zmmmmm

5/2/2026 at 10:45:01 PM

Slightly related: I am looking for:

- Easy single command CLI agent spawning with templates

- Automatic context transfer (i. e. a bit like git worktrees)

- Fully containerised, but remote (a bit like pods)

- Central, mitm-proxy zero trust authn/authz management (no keys or credentials inside the agents), rather enrichment in the hypervisor/encapsulation

- Multi agent follow-up functionalities

- Fully self hosted/FOSS

Basically a very dev-friendly, secure, "kubernetes"-like solution for running remote agents.

Anyone has an idea of how to achieve this or potential technologies?

by Koffiepoeder

5/3/2026 at 12:49:24 AM

Yeah, have you tried `mngr` by Imbue? It seems to have a bunch of the features you're looking for.

https://github.com/imbue-ai/mngr

by nvader

5/3/2026 at 6:49:23 PM

Thanks for the recommendation. This is very close to what I am looking for, at least with regards to the CLI.

The networking part I can fix with a second docker container and network_mode I think.

The centralised key and permission management and agent dashboarding is severly lacking though. But that's for now my least worry I think.

by Koffiepoeder

5/3/2026 at 9:03:48 PM

In the spirit of composing small units of software together, `mngr` works with `latchkey` which is a key injection/replacement proxy.

Latchkey does support some form of permissions management too.

by nvader

5/2/2026 at 11:02:59 PM

Is secretly rerouting reads/writes/edits of skills and memory any easier than just dumping the actual skills and memory files on disk at sandbox startup?

Another benefit of moving the harness outside the sandbox is you get to avoid accidentally creating a massive distributed system and you therefore don't have to think so much about events/communication between your main API and your sandboxes.

by sudb

5/3/2026 at 1:45:01 AM

Interesting idea. Tangentially related I’ve been using my local agent to interact with remote shells via zmx, described here: https://bower.sh/zmx-ai-portal

The use case is different but this article strikes some vague similarities around an agent API to remotely execute commands.

by qudat

5/3/2026 at 10:38:22 AM

Everything you guys write is absolute fire. I could not agree more :)

by filipeisho

5/3/2026 at 2:37:13 AM

The agent harness needs different sandbox(es) with different privileges. Nothing here supports not containing its access. It's a mistake to think and talk about "the sandbox" in the way the article does.

by pamcake

5/3/2026 at 4:30:08 AM

At least for me one of the major reasons to run an agent in a sandbox is to save memory on my machine if i am running multiple agents in parallel. Wouldnt this not help for that?

by avipeltz

5/2/2026 at 10:26:28 PM

Why are two concurrent sessions updating the same memory key with different values? IMO it probably points to a fundamental flaw in how memory is being thought about and built.

by solidasparagus

5/2/2026 at 10:40:14 PM

Author here. Because of parallelism and non determinism.

This problem is quite common and not limited to memories. For instance, Claude Code will block write attempts and steer the agent to perform a read first (because the file might have been modified in the meantime by the user or another agent).

Same principle here: rather than trying to deterministically “merge” concurrent writes, you fail the last write and let the agent read again and try another write

by aluzzardi

5/2/2026 at 10:00:21 PM

It took me a while to grok why this made any sense, I think the context is that this is for hosting many agents as a service.

by Retr0id

5/2/2026 at 10:05:57 PM

Exactly, my understanding is also that they host agents as a service. The actual use case is mentioned in the end of the article, which makes it hard to reason about.

Anyway. General advice: treat harnesses as any other (third-party) software that you run on your server. Modern harnesses (the ones from big companies, you need to subscribe to) are black boxes. Would you run a random binary you fetched from the internet on your server? Claude code, codex etc. are exactly this.

by qezz

5/2/2026 at 10:18:02 PM

We don't host 3rd party agents (I don't know if this what you implied). We built an agent that monitors CI pipelines, tests failures, performance and auto opens PR to address issues we find. We host our agent loop on a backend (it's in go), and we call to the sandbox when we run operations involving the user code.

by shad42

5/3/2026 at 11:26:31 AM

You need supporting environment on both sides of the sandbox.

by zbyforgotp

5/3/2026 at 2:51:01 PM

Sandboxes are for cats.

by kordlessagain

5/3/2026 at 6:23:48 AM

The idea of physically separating the agent harness and the artifacts on which you want the harness to work seems to be the wrong answer to the right problem, a symptom of a worrying problem with all the agentic workflow work I see in the wild: that of being lazy about how you develop the harness and tools it has available. Specifically, there seems to be a desire to make agent tools in an incredibly naive and simplistic way that ignores access control.

My company is not a "sophisticated" software development organization. We're 3000 people where 2000 do nothing with AI, 900 are naively jumping on every AI bandwagon that comes along (or rather, parroting whatever they read on the Internet to complain about how hard their job is despite it not having materially changed in the last 15 years), 90 are capable of doing anything towards implementing anything involving AI beyond "just use Claude," and 10 have the experience to really scrutinize what is going on. And our work is of a nature that scrutinizing the exact process of how results occurred, what data we came to it by, is essential. There are regulations and compliance issues that could land people in prison if we don't and the results are eventually proved to involve inappropriate data. What does that mean? I'll just say we primarily work for DoD.

I have very long experience with managers asking for the moon and not listening when the engineering staff raises red flags. They ask us, "why can't we just do X?" Where X is whatever they've recently read about in whatever MBA-targeted publication that was bought and paid for by the service provider profitting off of X, with no skin in the game regarding the nuance, because the relevant regulations are written to scapegoat the person in the chair bashing the keys and not the person making the decisions and hanging the former person's jobs over their heads. The "why can't we just do X" is not an in-good-faith question, it's a statement that you need to shut up about your concerns and "just do X."

But out of desperation/malicious compliance, I've started developing an agentic harness that can "just get AI to do it" for the data sources on which we work. And I've noticed two things: A) agent harnesses are not that hard to write (honestly, anyone with basic programming competency should be able to do it), and B) they can only ever work on what you give them. I suppose the last point should be obvious, but I've had enough conversations with folks who expect magic that it is clear that it is not actually obvious.

And that's where I get into "extant agent work is lazy." The agent harness I've developed is incapable of accessing data its operator should not have. If you are cleared to only see a subset of the universe of data, then running this harness cannot possibly give you access to more than your clearance. I'm not trying to brag here, because this was not a difficult guarantee to make. In developing the harness and giving it tools to do work, I just developed the same access controls I would have done for a human user accessing an API to the same data. The only thing that is different about my approach is that I didn't use an off-the-shelf harness with tools developed by others. I just wasn't lazy about my job.

My key stakeholder was skeptical that I was able to do this, mostly because he has subconsciously intuited that our organization is not very sophisticated in developing software. He doesn't understand that employing AI isn't magic, and I think that is the case for a lot of the people who use AI the most here. They see products like Claude go to work and think there is some kind of special sauce there that requires the development by a "frontier" AI firm to actualize. But the truth is, the more you develop agentic AI capability, the less AI you are actually employing. The AI eventually becomes just an orchestrator of tools that perform work by not-AI means. If you are lazy, you try to lean on naive tool implementations that let the AI do whatever "it wants." And that's where you get into trouble. But if you show up to your job and be diligent about implementing those tools, there is no possible way the AI can screw you over, because you never gave it the unrestricted access to curl or `rm -rf`.

This is why, even if AI does become a permanent fixture of software development (still not convinced, even after all this experience), you're still not getting rid of us software engineers, despite how much you hate us. You still need us to protect your data, and nothing about AI has changed the equation that ends in "data is king." If anything, it's more important than ever.

Edit: I'm specifically developing a multi-user agent, accessed via a Web application over a shared database. Row-level access control is baked into every tool and I can do this with little effort because dependency injection Is A Thing. Thus, the parameters of access control never even reach the AI.

by moron4hire

5/3/2026 at 4:03:58 PM

The article misses the point that you can have a third option - harness inside and outside both.

The use cases will evolve so much more in coming weeks and months that the boundaries will blur. You can still contain the agent outside in a sandbox though.

by mkagenius

5/2/2026 at 10:16:44 PM

we are running a harness outside the sandbox, inside a sandobx.

by 8thcross

5/3/2026 at 12:42:59 PM

Yeah, I also get that vibe from the article.

by schaefer

5/3/2026 at 3:01:33 PM

They should both be in sandboxes. Just different “sized” boxes.

by lowbloodsugar

5/2/2026 at 10:07:37 PM

[flagged]

by thinkneo_ai

5/3/2026 at 10:49:59 AM

[dead]

by jimmypk

5/3/2026 at 9:03:48 AM

[dead]

by nutfarm

5/3/2026 at 6:36:52 AM

[dead]

by ArielTM

5/3/2026 at 5:10:13 PM

[flagged]

by PEGHIN

5/3/2026 at 2:44:28 AM

[dead]

by steffs

5/3/2026 at 8:12:59 AM

[flagged]

by ibrahimhossain

5/3/2026 at 12:44:35 AM

[dead]

by eddyaipt

5/3/2026 at 3:44:27 AM

[dead]

by 0xkvyb

5/3/2026 at 3:46:48 AM

[dead]

by 0xkvyb

5/2/2026 at 10:18:36 PM

[dead]

by kweiza

5/3/2026 at 3:02:16 PM

[flagged]

by kordlessagain