3/7/2026 at 3:25:06 PM
Yeah, this is what happens when there's nothing between "the agent decided to do this" and "it happened." The agent followed the state file logically. It wasn't wrong. It just wasn't checked.His post-mortem is solid but I think he's overcorrecting. If he does this as part of a CICD pipeline and he manually reviews every time, he will pretty quickly get "verification fatigue". The vast majority of cases are fine, so he'll build the habit of automatically approving it. Sure, he'll deeply review the first ones, but over time it becomes less because he'll almost always find nothing. Then he'll pay less attention. This is how humans work.
He could automate the "easy" ones, though. TF plans are parseable, so maybe his time would be better spent only reviewing destructive changes. I've been running autonomous agents on production code for a while and this is the pattern that keeps working: start by reviewing everything, notice you're rubber-stamping most of it, then encode the safe cases so you only see the ones that matter.
by mrothroc
3/7/2026 at 3:59:24 PM
Or just never run agents on anything that touches production servers. That seems extremely obvious to me. He let Claude control terminal commands which touched his live servers.That's very different than asking it for help to make a plan.
by dmix
3/7/2026 at 4:39:01 PM
But the CEOs are saying everyone is going to be replaced by LLMs in 6 months. Surely that means they're capable of handling production environments without oversight from a professional.by scuff3d
3/7/2026 at 6:43:24 PM
they're doing as well as professionals do without oversight on production environments. There's no lack of stories about people deleting their production environments with data loss too.the fix has always been to limit what can be done directly to prod, and put it through both review, and tests before a change can touch production.
by 8note
3/7/2026 at 8:30:02 PM
> they're doing as well as professionals do without oversight on production environmentsThe difference is that if a human does it there usually is done accountability, you’ll be asked how it happened and expected to learn from it. And if you do it again your social score goes down, nobody will trust you and you’ll be consider a liability. If a cli tool does it the outcome is different, you might stop saying the tool or you might blame yourself for not giving the tool enough context. And if it does it again you might just shrug it off with “well of course, it’s just a tool”.
by prymitive
3/7/2026 at 8:53:10 PM
Accountability according to reputation is exactly what is happening for AI providers. All these articles about Claude destroying systems makes people trust Claude less, and maybe even “fire” Claude by choosing another AI provider with better safeguards or low privileges built in.by true_religion
3/7/2026 at 7:33:48 PM
So you're saying they need oversight... from a professional. Preferably someone with years of experience and domain expertise, who knows how to not fuck everything up?by scuff3d
3/7/2026 at 8:16:38 PM
Almost every software engineer seems to agree on that point. Not believing marketing hype is standard practice in this industry because plenty of us are inherently techno-optimists who have been burned by over-belief in the past.Regardless it is hard to dismiss the fact AI is making it easier for randoms to develop software. And it will keep getting better the more integrated and controlled it gets.
by dmix
3/7/2026 at 8:40:37 PM
If Hackernews is to be taken as a representative crosss section of the industry, I disagree. I've seen plenty of people on here so hyped it boarders on hysteria. I work with a couple of senior devs who have gotten downright weird about it.Maybe HN leans more toward the hobbiest and student side then it does industry professionals, I don't know, but you don't have to look far to find someone who swears up and down you can run a couple agents in a loop and have it build multi million line code bases with little to no oversight.
by scuff3d
3/7/2026 at 6:55:50 PM
> they're doing as well as professionals do without oversight on production environments.That's nonsense. First, most people haven't deleted the production environment by accident. They have enough sense to recognize that as a dangerous thing and will pause to think about it. Second, the ones who do make that mistake learn and won't make it again, which is not something the clanker is capable of.
by bigstrat2003
3/7/2026 at 7:01:43 PM
The article says that Claude did recognize the danger, and advised the developer to run a safer setup with no risk of the two websites stomping on each other's resources, but he overrode it. I've definitely seen situations in my career where a junior developer does something dangerous and destructive after a senior dev overrode guardrails meant to prevent it. (None quite this bad, but then again I've never worked on small sites.)by SpicyLemonZest
3/7/2026 at 4:45:14 PM
Are agents clever enough to seek and maybe use local privilege escalations? It seems like they should always run as their own user account with no credentials to anything, but I wonder if they will try to escape it somehow...by cozzyd
3/8/2026 at 4:34:40 AM
Yes, absolutely. I often see agents trying to 'sudo supervisorctl tail -f <program_name>', which fails because I don't give them sudo access. Then they realize they can just 'cat' the logfile itself and go ahead and do that.Sometimes they realize their MCP doesn't have access to something, so they pull an API Token for the service from the env vars of either my dev laptop, or SSH into one of the deployed VM's using keys from ~/.ssh/ and grab the API Token from the cloud VM's and then generate a curl command to do whatever they weren't given access to do.
Simple examples, but I've seen more complex workarounds too.
by nerdsniper
3/7/2026 at 7:59:04 PM
Just use a normal spare vps or run things in proper virtual machines depending on what you prefer. There are some projects like exe.xyz (invites closed it seems)Sprite.dev from fly.io is another good one that I had heard sometime ago. I am hearing less about it but it should only cost for when the resources are utilized which is a pretty cool concept too.
by Imustaskforhelp
3/8/2026 at 6:49:20 PM
> Are agents clever enough to seek and maybe use local privilege escalations?No. Definitely not. Regards, the CIA and the NSA /s
by hulitu