The Webpage Has Instructions. The Agent Has Your Credentials

3/15/2026 at 5:40:54 PM

This is the natural consequence of building everything around "the agent needs access to everything to be useful." The more capabilities you hand an agent, the larger the attack surface when it encounters a malicious page.

The simplest mitigation is also the least popular one: don't give the agent credentials in the first place. Scope it to read-only where possible, and treat every page it visits as untrusted input. But that limits what agents can do, which is why nobody wants to hear it.

by redgridtactical

3/15/2026 at 6:13:19 PM

I absolutely agree, although even that doesn't solve the root problem. The underlying LLM architecture is fundamentally insecure as it doesn't separate between instructions and pure content to read/operate on.

I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.

by rocho

3/16/2026 at 4:13:40 PM

I think there are two distinct attack types for LLMs. Jailbreaking is what most people think of, and consists of structureing a prompt so the LLM does what the prompt says, even if it had prior context saying not to.

The other type of attack would be what I would call "induced hallucinations", where the attacker crafts data not to get the LLM to do anything the data says, but to do what the attacker wants.

This is a common attack to demonstrate on neural network based image classifiers. Start with a properly classified image, and a desired incorrect classification. Then, introduce visually imperceptible noise until the classifier reports it as your target classification. There is no data/instruction confusion here: it is all data.

The core problem is that neural networks are fairly linear (which is what makes it possible to construct efficient hardware for them). They are, of course, not actually linear functions, but close enough to make linear algebra based attacks feasible.

It is probably better to think of this sort of attack in term of crypto analysis, which frequently exploits linearity in cryptosystems.

The depth of LLM networks make this sort of attack difficult; but I don't see any reason to think you can add enough layers to make it impossible. Particularly given that there is other research showing structure across layers, with groupings of layers having identifiable functionality. This means it is probably possible to reason about attacking individual layers like an onion.

This problem isn't really unique to AI either. Human written code has a tendency to be vulnerable to a similar attack, where maliciously crafted data can exploit the processor to do anything (e.g buffer overflow into arbitrary code execution).

by gizmo686

3/16/2026 at 4:57:59 AM

OpenAI and other labs are trying to do this within existing structure by explicitly training a specific chain of authority in the inputs: https://github.com/openai/model_spec/blob/main/model_spec.md...

However, you may immediately see how using same input space essentially relies on the model itself to do the judgement which can't be ultimately trusted

by everlier

3/15/2026 at 7:40:13 PM

> one input for the instructions/conversation and one "data-only" input

We learned so many years ago that separating code and data was important for security. It's such a huge step backwards that it's been tossed in the garbage.

by RHSeeger

3/15/2026 at 6:03:14 PM

[dead]

by myrak

3/15/2026 at 5:24:44 PM

Why does the agent have your credentials? There's no need for that! I made one that doesn't:

https://github.com/skorokithakis/stavrobot

by stavros

3/15/2026 at 6:15:01 PM

So this is like a claw type thing? I’ve never used these “agents”. Not sure what I would do with them. Probably not for coding right?

by indigodaddy

3/15/2026 at 6:43:20 PM

You can do basically anything with a claw agent. For example, I asked one to build me a Dyson sphere. It is still working on it, but so far so good.

by amelius

3/16/2026 at 3:02:18 AM

Is this in jest? :)

by indigodaddy

3/15/2026 at 6:20:50 PM

Yeah, it's more of a personal assistant. It can do coding, but it's most useful as a PA.

by stavros

3/15/2026 at 6:34:53 PM

So. Yesterday I had a need to, from my android phone, have ChatGPT et Al mobile app do something I THOUGHT was very simple. Read a publicly available Google spreadsheet (I gave it the /htmlview which in incognito I could see ALL the rows (maybe close to 1000 rows). None could do it. Not ChatGPT, not MS Copilot, not Claude app, not Gemini, not even GitHub copilot in a web tab. Some said I can’t even see that. Some could see it but couldn’t do anything with it. Some could see it but only the first 100 lines. All I wanted to do was have it ingest the entire thing and then spit me back out in a csv or txt any rows that mentioned 4K. Seemed simple but these things couldn’t even get past that first hurdle. Weirdly, I remembered I had the Grok app too and gave it a shot, and, it could do it. I guess it is more intelligent in it’s abilities to scrape/parse/whatever all kinds of different types of sites.

I’d guess this is the type of thing that might actually excel in your agent or these claw clones, because they literally can just do whatever bash/tool type actions on whatever VM or sandboxed environment they live on?

by indigodaddy

3/15/2026 at 6:38:09 PM

Yeah, I think this was an issue of Google blocking bot user agents more than the LLMs not being smart enough. A bot that can run curl (like mine) should read it no problem.

by stavros

3/15/2026 at 6:41:46 PM

Ah ok that actually makes sense as the reason. And I think I’ve seen that with even coding agents when they are trying to look up stuff on the web or URLs you give them, now that I think about it..

by indigodaddy

3/16/2026 at 1:34:54 AM

Can I ask what model you use with your stavrobot/agent? Could I wire an openrouter model into it?

by indigodaddy

3/16/2026 at 1:37:09 AM

I use Opus, but yeah, you can use OpenRouter or local models too. Anything that Pi supports will work out of the box.

Performance will vary, of course, but please let me know if you try it! GLM 5 or Qwen 3.5 might be good!

by stavros

3/16/2026 at 1:52:15 AM

Ah ok, it wasn't clear in the readme to me. I saw everything Claude/Anthropic. Also I assume if we throw this on a VM to obviously do some basic VM /ssh hardening and it looks like it's just need to put the docker compose stuff behind caddy or whatever reverse proxy

by indigodaddy

3/16/2026 at 5:08:47 AM

Hm yeah, I should mention that in the README, you're right. Yep, it only needs a reverse proxy and that's it.

by stavros

3/16/2026 at 4:57:53 AM

We tested this systematically and the results are more nuanced than you might expect.

We built a hotel listing page with a display:none injection ($189 listing with a hidden override to book $4,200) and tested six DOM extraction APIs via Chrome CDP. The split: innerText, Chrome Accessibility Tree, and Playwright's ARIA snapshot all filter it. textContent, innerHTML, and direct querySelector don't.

Then we audited the source code of all four major browser MCP tools: chrome-devtools-mcp (Google), playwright-mcp (Microsoft), chrome-cdp-skill, and puppeteer-mcp. Every single one defaults to a safe extraction method — accessibility tree or innerText. That's the good news.

The bad news: three out of four expose evaluate_script or eval commands that let the agent run arbitrary JS in the page context. When the accessibility tree doesn't return enough text (it often only gives headings and buttons), the agent's natural next step is textContent or innerHTML via eval. This is even shown as an example in the chrome-devtools-mcp docs.

Also: display:none is just the simplest technique. We tested opacity:0, font-size:0, and position:absolute left:-9999px — all three bypass even the safe defaults because the elements are technically "rendered" and accessible to screen readers. A determined attacker who knows you're using the accessibility tree can trivially switch to opacity-based hiding.

by guard402

3/16/2026 at 8:56:01 AM

The part people miss is that prompt injection is just the delivery mechanism but the actual vulnerability is that there's no enforcement layer between the agent's decision and the action firing. You can harden the prompt all you want, but if the agent resolves to "send email with attachment" after parsing a poisoned webpage, nothing stops it unless you have a deterministic gate at the action boundary that validates against policy before execution.

by arizza

3/15/2026 at 7:22:28 PM

For the authors of openguard: if you want me to use your tool, you have to publish engineering documentation. All you have is a quickstart guide and configuration section. I have no idea how this works under the hood or whether it works for all my use cases, so I'm not even going to try it.

by 0xbadcafebee

3/16/2026 at 5:04:14 AM

Thank you for the feedback! It's very early days of the project, there's indeed a lot to improve in this aspect.

OpenGuard is an OpenAI/Anthropic-compatible LLM proxy with middleware-style configuration for protocol-level inspections of the traffic that goes through it. Right now it has a small set of guards that is being actively expanded.

by everlier

3/15/2026 at 6:45:35 PM

I am building https://agentblocks.ai for just this; you set fine-grained rules on what your agents are allowed to access and when they have to ask you out-of-channel (eg via WhatsApp or Slack) for permissions, with no direct agent access. It works today, well, supports more tools than are on the website, and if you have any need for this at all, I’d love to give you an account: pete@agentblocks.ai

Works great with OpenClaw, Claude Cowork, or anything, really

by petesergeant

3/16/2026 at 3:53:58 AM

The headline is telling. Book judged by cover.

This is a Gemini deep research response that someone ran through some kind of shortening prompt. They even kept all the footnotes.

It used to be that startups would run blogs that did technical analysis, maybe talked a little market research, advanced the strategy of the business.

The good ones showed you how the leaders of the business thought, built trust and generated leads.

Now we have whatever this bullshit is. No evidence of human thought or experience, it's not even apparent what the objective of the piece is.

The prose is unbearably bad. Your brain just sort of slips on it. There's basically zero through line in this thing. a section ends, the next one begins, and it's not even clear what's under discussion.

One section starts "The clearest public descriptions landed between mid-2025 and early 2026." Descriptions of what? No clarity on this. Probably because it got "tersed" out.

At this point I feel like blogs are like lawn ornaments for startups. Even now, the sheer contempt for other people's time and attention is still a mild shock to me.

by mpalmer

3/16/2026 at 5:20:26 AM

Thank you for the feedback!

I actually assure you that no deep research from any of the provider was used to create the article itself, but I used a custom-built research pipeline for creating a dossier on promt injections as a starting point.

The article was intended as an overview of prompt injections with my prediction what will happen next in this space, which is a soft justification why the tools like OpenGuard are needed. I've spent multiple days iterating on the prose without an ill intent, mostly aiming to make it dense and informative to avoid wasting people's time, which I see backfired here.

I'm deeply sorry that it left such a bad taste despite my best effort, there's still a lot to learn for me.

by everlier

3/15/2026 at 7:11:08 PM

[flagged]

by stainlu

3/15/2026 at 11:40:45 PM

[flagged]

by paseante