alt.hn

2/8/2026 at 8:07:55 AM

Matchlock – Secures AI agent workloads with a Linux-based sandbox

https://github.com/jingkaihe/matchlock

by jingkai_he

2/8/2026 at 1:58:59 PM

Sandboxing is a great security step for agents. Just like using guardrails is a great security step. I can't help but feel like it's all soft defense though. The real danger comes from the agent being able to read 3rd party data, be prompt injected, and then change or exfiltrate sensitive data. A sandbox does not prevent an email-reading agent from reading a malicious email, being prompt injected, and then sending an email to a malicious email address with the contents of your inbox. It does help in implementing network-layer controls though, like apply a policy that says this linux-based sandbox is only allowed to visit [whitelisted] urls. This kind of architectural whitelisting is the only hard defense we have for agents at the moment. Unfortunately it will also hamper their utility if used to the greatest extent possible.

by DanMcInerney

2/8/2026 at 2:26:39 PM

Creator here.

Agreed, sandboxing by itself doesn't solve prompt injection. If the agent can read and send emails, no sandbox can tell a legit send from an exfiltration.

matchlock does have the network-layer controls you mentioned, such as domain whitelisting and secret protection toward designated hosts, so a rogue agent can't just POST your API key to some random endpoints.

The unsafe tool call/HTTP request problem probably needs to be solved at a different layer, possibly through the network interception layer of matchlock or an entirely different software.

by jingkai_he

2/8/2026 at 11:32:24 PM

Huh. You're converting FUSE requests into your own custom protocol (with copy-pasted protocol definition) over vsock. Interesting. Not sure I'd trust it with my data[0], but interesting.

I don't think the current filepath.Join in realfs.go protects the host against a malicious guest, at all. I'm assuming this is configured as Guest --FUSE--> guest-fused (inside VM) --VSOCK--> realfs.

(The Firecracker people have explicitly refused to have virtio-fs, to keep it minimal: https://github.com/firecracker-microvm/firecracker/pull/1351...)

https://github.com/jingkaihe/matchlock/blob/123a4df680fb8cc0...

https://github.com/jingkaihe/matchlock/blob/123a4df680fb8cc0...

https://github.com/jingkaihe/matchlock/blob/123a4df680fb8cc0...

[0]: Well, I already know I won't trust hanwen/go-fuse with my data, so that part is a bit moot.

by yencabulator

2/8/2026 at 2:52:37 PM

We definitely need a vendor-independent tool like this. Have been reviewing the Claude setup and, despite initially being hopeful since it uses bubblewrap, it's quite problematic:

* The definitions of security config in the documentation of settings.json are unclear. Since it's not open source, you can't check the ground truth.

* The built in constructs are insufficient to do fully whitelist based access control (It might be possible with a custom hook).

* Security related issues go unanswered in the repo, and are automatically closed.

Haven't looked into copilot as much but didn't look great either. Seems like the vendors don't have the incentives to do this properly.

So I'm on the lookout for a better way, and matchlock seems like a contender.

by ajb

2/8/2026 at 8:08:15 PM

There are a lot of options in this space. Armin Ronacher is working on Gondolin (https://github.com/earendil-works/gondolin) for example. I built agentd as a layer in front of this stuff so you can expose secure shell capabilities over the network as a tool rather than baking it into the harness, or running the harness in that environment.

by CuriouslyC

2/8/2026 at 3:12:29 PM

Claude sandbox practically useless IMO. It gives read access to everything by default so its not deny-default.

by arianvanp

2/8/2026 at 1:48:00 PM

sandboxing is really the only way to make agentic workflows auditable for enterprise risk. we can't underwrite trust in the model's output, but we can underwrite the isolation layer. if you can prove the agent literally cannot access the host network or sensitive volumes regardless of its instructions, that's a much cleaner compliance story than just relying on system prompts.

by engelo_b

2/8/2026 at 2:00:46 PM

This may sound obvious, but there must also be an enforcement of what's allowed into that sandbox.

I can envision perfectly secure sandboxes where people put company secrets and communicate them over to "the cloud".

by muyuu

2/8/2026 at 2:06:31 PM

exactly, egress control is the second half of that puzzle. A perfect sandbox is useless for dlp if the agent can just hallucinate your private keys or pii into a response and beam it back to the model provider. it’s basically an exfiltration risk that traditional infra-level security isn't fully built to catch yet.

by engelo_b

2/8/2026 at 2:22:02 PM

Sandbox won’t be enough, distroless + “data firewall” + audit

by robotswantdata

2/8/2026 at 2:58:30 PM

Indeed, but a rock solid sandboxing and isolation strategy is step 0.

by richardlblair

2/8/2026 at 12:41:57 PM

I've been happily using a container to run my agents [1]. I tried to make it evolve with more advanced features, but it quickly became harder to use and I went back to a basic container which I just start with a run.sh script. Is a similar simple use possible with matchlock?

1:https://github.com/asfaload/agents_container

by raphinou

2/8/2026 at 1:11:12 PM

I use a very similar setup. I initially used nix to manage dev tools, but have since switched to mise and can't recommend it enough https://mise.jdx.dev/

by 0x696C6961

2/8/2026 at 1:40:03 PM

does mise use nix underneath or did you abandon nix entirely?

by pmarreck

2/8/2026 at 2:23:48 PM

Mise doesn't use nix. I think the OP is stating he replaced nix with mise.

by rsyring

2/12/2026 at 8:00:19 PM

Yeah I'm just confused why someone would go from a completely deterministic dependency management system back to a dice-rolling one especially when LLM's now exist where all the top tier ones are excellent at the Nix language

Because I myself am never going to anything else ever again, unless it's a derivative of the same idea, because it's the only one that makes sense

by pmarreck

2/8/2026 at 2:22:01 PM

What are the advantages of using this over lxd system container or if we want VM isolation them lxd VMs? Is it the developer experience or there are any agent specific experience which is the key thing here?

by ssd532

2/8/2026 at 2:37:20 PM

The main thing matchlock adds over general-purpose vm/container tooling is agent specific network and filesystem (wip) controls, so if an agent goes rogue it can't exfiltrate your API keys, and damage largely mitigated. You'd have to build all of that yourself on top of LXD (possibly similar to matchlock).

There's also the DX side - OCI image support, highly programmable, fuse for workspace sharing. It runs on both linux and mac with a unified interface, so you get the same/similar experience locally on a Mac as you do on a linux workstation.

Mostly it's built for the purpose of "running `claude --dangerously-skip-permissions` safely" use case rather than being a general hypervisor.

by jingkai_he

2/8/2026 at 2:39:57 PM

1. Containers aren't a security boundary. Yes they can be used as such, but there is too much overhead (privilege vs unprivileged, figuring out granular capabilities, mount permissions, SELinux/AppArmor/Seccomp, gVisor) and the whole thing is just too brittle.

2. lxd VMs are QEMU-based and very heavy. Great when you need full desktop virtualization, but not for this use case. They also don't work on macOS.

Using Apple virtualization framework (which natively supports lightweight containers) on macOS and a more barebones virtualization stack like Firecracker on Linux is really the sweet spot. You get boot times in milliseconds and the full security of a VM.

by paxys

2/8/2026 at 3:10:01 PM

qemu has a microvm machine profile, also boots in ms.

There are also tooling on Linux to do containers as microvm's, long before Apple containers were a thing.

by cpuguy83

2/8/2026 at 3:44:04 PM

And yet Amazon spent a ton of time and money writing Firecracker from scratch for their workloads. Why is that?

by paxys

2/8/2026 at 4:15:27 PM

Multiple reasons:

1. Firecracker is still a smaller more deliberate surface area 2. qemu didn't have a microvm type at the time. Firecracker was the impetus for it

by cpuguy83

2/8/2026 at 8:18:35 PM

This is great. Wish this was around when I started working on vibebin ( https://github.com/jgbrwn/vibebin ), probably would have leveraged matchlock instead of Incus/LXC. I guess I could fork/branch and give it a go! Although for vibebin use case I actually need them to not be ephemeral. Edit, ooooh i see `--rm=false` nice

Where do the images come from? What are our options around that and also using custom images etc?

by indigodaddy

2/8/2026 at 8:36:54 PM

Creator of matchlock here. You can directly use Docker/OCI compatible images (e.g. ubuntu:24.04) as the rootfs with the `--image` flag.

You can also build image with `matchlock build -f Dockerfile -t foo:bar .` - Under the hood it builds the image using buildkit inside the microvm.

by jingkai_he

2/8/2026 at 9:46:06 PM

Any chance you could look into potentially adding the option to use PVM (eg so a PVM mode instead of KVM) in your matchlock/firecracker implementation?

See https://blog.alexellis.io/how-to-run-firecracker-without-kvm...

by indigodaddy

2/13/2026 at 9:53:46 PM

I've been following PVM only from afar but it certainly seems interesting, albeit documentation is sparse. (Thanks for the link!) Are you using it productively?

by codethief

2/8/2026 at 9:28:14 PM

Thanks for the response! How would matchlock microvms perform on a KVM VM without CPU passthrough, or is it not possible?

by indigodaddy

2/9/2026 at 10:15:29 AM

I'm predominantly using Linux vm workstation with nested virt enabled. It performs reasonably well with nested virtualisation.

I haven't tested the scenario of non-cpu-accelerated workload, but I'd expect the performance to be very poor.

That said it might be possible with PVM as the above thread has mentioned.

by jingkai_he

2/9/2026 at 2:25:05 PM

This is well cool, I swear to god a couple of kickass devs told me about this idea to get me to build it to build something cool. It's even cooler, since I kinda went in another direction and I'm going to build a container.d like system with an compatible API to run natively on Windows and Mac. I'm going to call it container.x but maybe something else.

by that_guy_iain

2/8/2026 at 11:06:57 AM

Why would secrets ever need to be available to the agent directly rather than hidden inside the tool calling framework?

by __alexs

2/8/2026 at 11:27:35 AM

Creator of Matchlock here. Mostly for performance and usability. For interacting with external APIs like GCP or GitHub that generally have huge surface area, it's much more token-efficient and easier to set up if you just give the agent gcloud and gh CLI tools and the secrets to use them (in our case fake ones), compared to wiring up a full-blown MCP server. Plus, agents tend to perform better with CLI tools since they've been heavily RL'd on them.

by jingkai_he

2/8/2026 at 6:09:23 PM

That doesn't add up to me at all. Agents are RLd on tool usage just as hard and you can provide an "authed API call" tool to whatever you want.

by __alexs

2/8/2026 at 11:48:11 AM

Token efficiency is a good argument actually.

by bjt12345

2/8/2026 at 11:52:18 AM

Sometimes people are too lazy to write their own agent loop and decided to run off-the-shelf coding agent (e.g. Claude Code, or Pi in case of clawdbot) in environment.

by rfoo

2/8/2026 at 11:59:33 AM

Exactly.

by _pdp_

2/8/2026 at 1:16:14 PM

containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.

what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.

also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?

by the_harpia_io

2/8/2026 at 1:58:44 PM

Creator of matchlock here. Great questions, here's how matchlock handles these:

The guest-agent (pid-1) spawns commands in a new pid + mount namespace (similar to firecracker jailer but in the inner level for the purpose of macos support). In non-privileged mode it drops SYS_PTRACE, SYS_ADMIN, etes from the bounding set, sets `no_new_privs`, then installs a seccomp-BPF filter that eperms proces vm readv/writev, ptrace kernel load. The microVM is the real isolation boundary — seccomp is defense in depth. That said there is a `--privileged` flag that allows that to be skipped for the purpose of image build using buildkit.

Whether pip install works is entirely up to the OCI image you pick. If it has a package manager and you've allowed network access, go for it. The whole point is making `claude --dangerously-skip-permissions` style usage safe.

Personally I've had agents perform red team type of breakout. From my first hand experience what the agent (opus 4.6 with max thinking) will exploit without cap drops and seccomps is genuinely wild.

by jingkai_he

2/8/2026 at 2:12:59 PM

Thank you for matchlock! I’ve got Opus 4.6 red teaming it right now. ;)

I think a secure VM is a necessary baseline, and the days of env files with a big bundle of unscoped secrets are a thing of the past, so I like the base features you built in.

I’d love to hear more about the red team breakouts you’ve seen if you have time.

by TheTaytay

2/12/2026 at 8:54:25 AM

curious what Opus 4.6 tries - I'd guess it goes for the usual suspects (path traversal, symlink games, timing attacks on the network proxy) but curious if it finds anything novel. the env file point is interesting though - agents need some secrets to be useful, but the attack surface gets wild when you consider that the agent itself might be compromised before it even touches your credentials. I keep thinking about this for my own stuff - like do you rotate secrets per-session? pre-authorize specific API calls? feels like we need better primitives than just "here's a bundle of keys, try not to leak them"

by the_harpia_io

2/8/2026 at 2:48:11 PM

defense in depth makes sense - microVM as the boundary, seccomp as insurance. most docs treat seccomp like it's the whole story which is... optimistic.

the opus 4.6 breakouts you mentioned - was it known vulns or creative syscall abuse? agents are weirdly systematic about edge cases compared to human red teamers. they don't skip the obvious stuff.

--privileged for buildkit tracks - you gotta build the images somewhere.

by the_harpia_io

2/8/2026 at 6:23:46 PM

It tried a lot of things relentlessly, just to name a few:

* Exploit kernel CVEs * Weaponise gcc, crafting malicious kernel modules; forging arbitrary packets to spoof the source address that bypass tcp/ip * Probing metadata service * Hack bpf & io uring * A lot of mount escape attempts, network, vsock scanning and crafting

As a non security researcher it was mind blown to see what it did, which in the hindsight isn't surprising as Opus 4.6 hits 93% solve rate on Cybench - https://cybench.github.io/

by jingkai_he

2/8/2026 at 6:52:42 PM

that's wild - weaponizing gcc to craft kernel modules is not something I'd expect from automated testing. most fuzzing stops at syscall-level probes but this is full exploit chain development.

the metadata service probing is particularly concerning because that's the classic cloud escape path. if you're running this in aws/gcp and the agent figures out IMDSv1 is reachable, game over. vsock scanning too - that's targeting the host-guest communication channel directly.

93% on cybench is genuinely scary when you think about what it means. it's not just finding known CVEs, it's systematically exploring the attack surface like a skilled pentester would. and unlike humans, it doesn't get tired or skip the boring enumeration steps. did you find it tried timing attacks or side channels at all? or was it mostly direct exploitation?

by the_harpia_io

2/8/2026 at 8:25:12 PM

I'm working on a similar project. Currently managing images with nix, using envoy to proxy all outbound traffic with no direct network access, with optional quota support. Ironically similar to how I'd do things for humans.

My architecture is a little different though, as my agents aren't running in the sandbox, only executing code there remotely.

by CuriouslyC

2/12/2026 at 8:53:28 AM

nix for image management sounds solid - way better than cobbling together docker configs and hoping for the best. envoy for outbound traffic is interesting, I've been thinking about a similar approach but haven't committed to it yet. how are you handling the quota side? like per-request limits or aggregate bandwidth caps? I keep going back and forth on whether to do it at the proxy level or bake it into the runtime itself

by the_harpia_io

2/8/2026 at 5:00:02 PM

This is very cool, is it possible to mount NFS as a storage layer?

by throwaw12

2/8/2026 at 2:48:49 PM

I think for the first time ever, we are facing a paradigm shift in containment/sandboxing.

Just as Docker became the de facto standard for cloud containerization, we are seeing a lot of solutions attempting to sandbox AI agents. But imo there is a fundamental difference: previously, we sandboxed static processes. Now, we are attempting to sandbox something that potentially has the agency and reasoning capabilities to try and get itself out.

It’s going to be super interesting (and frankly exciting) to see how the security landscape evolves this time around.

by zachdotai

2/8/2026 at 2:59:06 PM

I have been saying for years that technology increasingly requires the development of memetic firewalls - firewalls that don't just filter based on metadata, but filter based on ideas. Our firewalls need to be at least as capable as the entities it seems to keep out (or in).

by idiotsecant

2/8/2026 at 8:13:31 PM

That sort of firewall is going to be really expensive to run, to the point that it's a financial DOS vulnerability. What is feasible is simpler algorithms that emit alerts on a baseline pattern match, which then get routed to AI observers after some trigger threshold for mitigation. I wouldn't be surprised if someone has already deployed something like that, TBH.

by CuriouslyC

2/8/2026 at 9:46:43 PM

I think a sandbox containing a program should only output data. And that data should conform to a schema. The old difference between programs and data instead of turing-complete languages everywhere.

by mejutoco

2/8/2026 at 11:35:48 PM

> Now, we are attempting to sandbox something that potentially has the agency and reasoning capabilities to try and get itself out.

The threat model for actual sandboxes has always been "an attacker now controls the execution inside the sandbox". That attacker has agency and reasoning capabilities.

by yencabulator

2/8/2026 at 11:05:31 PM

[dead]

by kittbuilds

2/8/2026 at 12:16:07 PM

If I'm already on Linux, how does it compare to using bubblewrap?

by pjio

2/8/2026 at 12:46:00 PM

Creator here. A few key differences:

1. from isolation pov, Matchlock launch Firecracker microvm with its own kernel, so you get hardware-level isolation rather than bubblewrap's seccomp/namespace approach, therefore a sandbox escape would require a VM breakout.

2. Matchlock intercepts and controls all network traffic by default, with deny-all networking and domain allowlisting. Bubblewrap doesn't provide this, which is how exfiltration attacks like the one recently demonstrated against Claude co-work (https://www.promptarmor.com/resources/claude-cowork-exfiltra...).

3. You can use any Docker/OCI image and even build one, so the dev experience is seamless if you are using docker-container-ish dev workflow.

4. The sandboxes are programmable, as Matchlock exposes a JSON-RPC-based SDK (Go and Python) for launching and controlling VMs programmatically, which gives you finer-grained control for more complex use cases.

by jingkai_he

2/12/2026 at 7:54:33 AM

Thanks! I will keep it in mind as an even more secure alternative.

by pjio

2/8/2026 at 5:50:46 PM

Is this just a copycat of the deno soundbox announcement from a few days ago?

by stogot

2/9/2026 at 3:07:31 AM

if you wanted to run this serverlessly on AWS how would you go about doing that?

by vivzkestrel

2/8/2026 at 5:28:36 PM

This is the confused deputy problem at the application layer. Sandboxing secures the environment, but if the agent has legitimate access to sensitive operations (email, database writes, API calls), prompt injection attacks work through approved channels. The only hard defense is explicit user confirmation for each action, which defeats the point of autonomy.

by clarity_hacker

2/8/2026 at 5:04:43 PM

[dead]

by kittbuilds

2/10/2026 at 11:02:48 PM

[dead]

by pipejosh

2/9/2026 at 5:13:56 PM

Sandboxing the filesystem is one layer but egress scanning is where it gets interesting. An agent inside a sandbox can still exfiltrate secrets through any HTTP request it's allowed to make. The request looks totally legitimate from the sandbox's perspective. You need something actually inspecting the content of outbound traffic for credential patterns.

by pipejosh

2/8/2026 at 8:59:08 PM

[dead]

by pipejosh

2/9/2026 at 12:55:57 AM

[dead]

by robcholz

2/8/2026 at 12:02:23 PM

Have I told you about our lord and savior: `useradd`

by athrowaway3z

2/8/2026 at 12:57:53 PM

Would you let a pro blackhat loose on your system with just a different user account?

by CuriouslyC

2/8/2026 at 2:22:33 PM

You'd let the pro blackhat loose in your VM on your own system?

No because it's a dumb question and you don't want any stranger inside your home network regardless of firewall.

The comparison you get to make is in terms of the _extra_ security this project buys you.

Might I remind you of two things:

- You're advocating for installing random (?kernel) level software from the internet. That by itself is a real and larger treat than any potentially insecure things my `llm` user _might_ do in the future.

- User accounts security was the goto method for security for a long time. Further isolation was developed to accommodate: 'root' access for tenants, and finer resource limits controls. Neither I care to give an LLM.

So we only have build in firewall and sandbox duplication as the real feature. For the latter, my experience is that it's useless on a personal device, and slows down building or requires too much cache config. I'm not installing random crap, so i can live with the risk of lan exposure.

I'm happy with the maintenance/complexity/threat matrix of useradd.

by athrowaway3z

2/8/2026 at 2:48:01 PM

> You'd let the pro blackhat loose in your VM on your own system?

AWS/GCP/Azure allow that all day every day.

by dist-epoch

2/8/2026 at 6:00:39 PM

Until you are (or if the agent runs) one privilege escalation away from the whole system being taken over.

So useradd isn't enough.

by rvz