Lockdown Mode | alt.hn

6/6/2026 at 5:34:20 AM

On the one hand this is exactly the right solution to prevent lethal trifecta exfiltration attacks.

The existence of lockdown mode does however imply that ChatGPT, in its default settings, does not provide robust protection against sufficiently determined data exfiltration attacks!

by simonw

6/6/2026 at 5:57:44 AM

Related: Simon Willison’s post on OpenAI’s new Lockdown Mode (he coined the “lethal trifecta” term this is based on): https://simonwillison.net/2026/Jun/5/openai-help-lockdown-mo...

by berlianta

6/6/2026 at 6:17:17 AM

Related: simonw is Simon Willison

by jameshart

6/6/2026 at 6:38:29 AM

Yeah I know the source references him (replying to his comment), that's exactly why I'm giving credit where it's due

by berlianta

6/6/2026 at 11:46:10 AM

It’s important to draw it out explicitly- I didn’t even look at the commentators name until it was mentioned. (If I see pelicans …)

by bombcar

6/6/2026 at 11:44:15 AM

I wonder what robust protection would mean in practice for such a capable tool like an agent...

Looking at the trifecta axis, if we assume we can't control untrusted content, that leaves us to create safeguards for private data access and external communication.

Would it be enough if we had a buffer between when these two happened: access to the environment and access to the web?

by gchamonlive

6/6/2026 at 12:43:00 PM

Robust protection means blocking any mechanism by which the agent, once compromised, might communicate stolen information back to an attacker.

by simonw

6/6/2026 at 6:41:49 AM

I hadn't realized that deep research or generating images that I paste into Twitter were possibly exfiltrating my data. Yikes.

by Noumenon72

6/6/2026 at 3:47:25 AM

Probably influenced by Apple's feature with the same name: https://support.apple.com/en-us/105120

I imagine that enterprise companies will be quite interested in this.

by varenc

6/6/2026 at 8:31:45 AM

> reduce the risk of data exfiltration

Yet, their tools such as codex are able to read ALL FILES on my PC without explicit permission unless you spawn them within a container: https://github.com/openai/codex/issues/2847

It seems like OpenAI stealing sensitive data from their customers is not a big problem for them as it has been reported as an issue for almost a year now and currently has the 2nd most upvotes among open issues (they work on issues based on upvotes, so they claim).

by thomas34298

6/6/2026 at 8:38:09 AM

>Yet, their tools such as codex are able to read ALL FILES on my PC

Why not just use your OS-integrated permission mechanism? No container needed.

by BSDobelix

6/6/2026 at 6:10:34 AM

"Prompt injection is not currently a major risk, but its impact could grow as attackers develop more sophisticated methods." - that's such a weird statement to make. It's one of the most significant factors limiting the adoption of the technology in business.

I have mixed feelings about this feature. We're playing with tech that's supposed to do human-shaped things but can't be trusted nearly as much as a human employee (and can't be held responsible for what it does). Restricting the tools available to that patently untrustworthy entity doesn't solve the problem, it just makes the entity less useful, forcing you to sooner or later let it out of the jail.

by zerobees

6/6/2026 at 11:56:27 AM

I'm also surprised that they considered it reasonable to turn so many features off. Seems like some of it could be configurable, like allowed external connections. I also think some secrets should be handled by a proxy, which would give more capability than just locking down.

by cosmicriver

6/6/2026 at 7:15:34 AM

Responsibility is worthless for humans and even more worthless for AIs. In a way, AIs just make it more obvious.

And "trusted nearly as much as a human employee", well... you do know that phishing and insiders are two primary ways for attackers to get into company infrastructure, right?

AIs pair human-shaped capabilities with human-shaped vulnerabilities. It's a way of automating PEBKAC.

by ACCount37

6/6/2026 at 7:51:57 AM

> forcing you to sooner or later let it out of the jail

Suspect thats the point, by giving you the “choice” they also make the user responsible or can at least shift the blame.

by noir_lord

6/6/2026 at 3:48:29 AM

https://x.com/sama/status/1891533802779910471

by rafram

6/6/2026 at 6:26:16 AM

Somehow he comes off as even less human than zuck

by throwaway27448

6/6/2026 at 7:50:28 AM

There is something so off about him for me that he makes my skin crawl.

Always has been before he was associated with OpenAI.

Which is weird because the bullshit he spouts isn’t so different to the bullshit other top execs spout and I don’t have the same visceral reaction to them (though I still don’t like a bunch of them).

by noir_lord

6/6/2026 at 11:29:59 AM

[flagged]

by Laurel1234

6/6/2026 at 3:49:24 AM

i can definitely feel the agi now

by ares623

6/6/2026 at 4:23:53 AM

Congratulations, you are a high taste tester!

by neonstatic

6/6/2026 at 7:21:50 AM

Is this an admission that prompt injection attacks can indeed not be blocked by an analysis based technique?

If so many tools are straight up blocked, I would be very sceptical of the quality of the results.

by kirtivr

6/6/2026 at 7:25:56 AM

I think "prompt injection prevention" systems fall into the same category as "llm writing detection" systems. I.e. reality is always a step ahead and you shouldn't trust either one for anything remotely important.

by sigmoid10

6/6/2026 at 8:31:41 AM

Yeah, the problem reduces to trying to restrict a motivated model which is trying to exfiltrate data.

That's a problem we are just now wrapping our minds around.

It's not as simple as prompt sanitization. The model is the interpreter, and we don't yet have the right tools to guide it.

by kirtivr

6/6/2026 at 2:30:54 PM

Wow, it’s almost like you can use it as if you’re just calling the LLM directly. What a crazy innovation!

by amluto

6/6/2026 at 4:10:13 AM

So we still don't have a reliable way to separate instructions from data when talking to an LLM, a problem that humans learned how to solve decades ago in areas like SQL and memory safety. But hey, we have these hopefully-not-leaky containers, which are probably implemented with just more system prompts.

How long until somebody figures out how to trick Codex into disabling Lockdown Mode for you?

by kijin

6/6/2026 at 5:27:43 AM

> So we still don't have a reliable way to separate instructions from data when talking to an LLM

Humans also do not know how to do this reliably, which is why phishing is still a thing and always will be.

by mapontosevenths

6/6/2026 at 5:55:24 AM

I think the Stroop effect ("read these colour names, each written in a different colour") is probably the purest demonstration of this. Humans are trivially prompt-injectable.

by Smaug123

6/6/2026 at 1:51:12 PM

> Humans also do not know how to do this reliably

These are machines, not humans, so I don't understand the comparison. The point of tech advancement is that we eliminate entire classes of errors that humans make. You'd probably look at me funny if I wrote a production application that failed randomly in unexpected ways like corrupting data, opening security holes, etc. then explained it away with "well, humans do it too!"

by hypeatei

6/6/2026 at 3:27:12 PM

It's an artificial intelligence, not a small deterministic shell script. Stop comparing it to one. It has both new capabilities and new classes of failure mode. Those new failure modes are more like human failure modes than traditional symbolic logic failures.

We need to get better at using them and building them by validating both the inputs and outputs of such systems in more sophisticated ways, but to act surprised and denounce them because they fail in different ways than more primitive systems misses the point.

They're stochastic by design. If we want deterministic results we must use deterministic validators in conjunction with the stochastic system. It's trivial, and one day security experts will look back on the time when people didn't in the same way we look back on 90's software that didn't validate user input at all.

by mapontosevenths

6/6/2026 at 4:20:47 AM

We can seperate them but the $ value of an agent that does is much lower than one that doesn't.

As a pre LLM analogy imagine working at a bank with a whitelist firewall. You need to install a package but requires an IT ticket. Safer but slooooower.

Now not saying what the answer here is but that is the issue.

The answer may be more like industries that get safer through lessons (like aviation) rather than go for 100% safety out of the gate. Because both fast travel and AI agents are insanely useful.

by dnnddidiej

6/6/2026 at 4:50:25 AM

what? Aviation safety is not designed to get safer through lessons? They literally try to ensure it is 100% safe out of the gate. The accidents that happen are usually statistical outliers and lead to loss of life.

That's what it means when they say aviation regulations are written in blood. Not that they just fling planes into the sky and be like "boy i hope we learn some new regulations from this". The number of airplane crashes would be astronomically larger if the 100% safety part was not embedded into the design process.

by altmanaltman

6/6/2026 at 5:21:13 AM

I think we agree? Unless my reading comp is off today.

by dnnddidiej

6/6/2026 at 4:14:01 AM

The help doc explicitly carves out Codex: "Lockdown Mode does not affect network access in Codex." The mode limits outbound requests in chat to block prompt injection exfiltration, but Codex network access is a separate setting. An enterprise team that turns on Lockdown Mode while using Codex against internal repos still has an open outbound path this mode doesn't cover.

by madanparas

6/6/2026 at 5:38:28 AM

[dead]

by vladsiu