3/30/2026 at 8:29:34 PM
I don't understand why the takeaway here is (unless I'm missing something), more or less "everything is going to get exploited all the time". If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software (or, at the very least, software for which LLMs can no longer find any new vulnerabilities)?by stavros
3/30/2026 at 8:43:55 PM
When did we enter the twilight zone where bug trackers are consistently empty? The limiting factor of bug reduction is remediation, not discovery. Even developer smoke testing usually surfaces bugs at a rate far faster than they can be fixed let alone actual QA.To be fair, the limiting factor in remediation is usually finding a reproducible test case which a vulnerability is by necessity. But, I would still bet most systems have plenty of bugs in their bug trackers which are accompanied by a reproducible test case which are still bottlenecked on remediation resources.
This is of course orthogonal to the fact that patching systems that are insecure by design into security has so far been a colossal failure.
by Veserv
3/31/2026 at 6:15:49 AM
Bugs are not the same as (real) high severity bugs.If you find a bug in a web browser, that's no big deal. I've encountered bugs in web browsers all the time.
You figure out how to make a web page that when viewed deletes all the files on the user's hard drive? That's a little different and not something that people discover very often.
Sure, you'll still probably have a long queue of ReDoS bugs, but the only people who think those are security issues are people who enjoy the ego boost if having a cve in their name.
by bawolff
3/31/2026 at 7:14:15 AM
Eh, with browsers you can tell the user to go to hell if they don't like a secure but broken experience. The problem in most software is that you commit to bad ideas and then have to upset people who have higher status than the software dev that would tell them to go to hell.by kackerlacker
3/30/2026 at 9:18:53 PM
That might have been true pre LLMs but you can literally point an agent at the queue until it’s empty now.by reactordev
3/30/2026 at 9:27:04 PM
You literally cannot, since ANY changes to code tend to introduce unintended (or at least not explicitly requested) new behaviors.by batshit_beaver
3/30/2026 at 9:32:49 PM
Eventual convergence? Assuming each defect fix has a 30% chance of introducing a new defect, we keep cycling until done?by lll-o-lll
3/30/2026 at 9:49:00 PM
Assuming you can catch every new bug it introduces.Both assumptions being unlikely.
You also end up with a code base you let an AI agent trample until it is satisfied; ballooned in complexity and redudant brittle code.
by saintfire
3/30/2026 at 10:23:02 PM
You can have an AI agent refactor and improve code quality.by charcircuit
3/31/2026 at 12:22:17 AM
But, have you any code that has been vetted and verified to see if this approach works? This whole Agentic code quality claim is an assertion, but where is the literal proof?by abakker
3/31/2026 at 4:22:49 PM
Did we have code quality before llms?by wredcoll
3/31/2026 at 7:34:05 AM
If it can be trained with reinforcement learning then it will happenby WithinReason
3/31/2026 at 8:01:42 AM
Funnily enough I've literally never seen anyone demo this, despite all the other AI hype. It's the one thing that convinces me they're still behind.by lmm
3/31/2026 at 2:27:15 PM
Just today I had an agent add a fourth "special case" to a codebase, and I went back and DRY'd three of them.Now I used the agent to do a lot of the grunt work in that refactor, but it was still a design decision initiated by me. The chatbot, left unattended, would not have seen that needed to be done. (And when, during my refactor, it tried to fold in the fourth case I had to stop it.)
(And for a lot of code, that's ok - my static site generator is an unholy mess at this point, and I don't much care. But for paid work...)
by flir
3/31/2026 at 5:20:12 AM
It’s agents all the way down - until you have liability. At some point, it’s going to be someone’s neck on the line, and saying “the agents know” isn’t going to satisfy customers (or in a worst case, courts).by intended
3/31/2026 at 4:24:58 PM
> until you have liabilityAnd are you thinking this going to start happening at some point or what?
The letters I get every other month telling me I now have free credit monitoring because of a personal info breach seems to suggest otherwise.
by wredcoll
3/31/2026 at 7:26:10 PM
A firm has very different amounts of time, ability and money to spend on following up on broken contracts.by intended
3/31/2026 at 7:45:38 AM
Sure it can. It's not like humans aren't already deflecting liability or moving it to insurance agencies.by charcircuit
3/31/2026 at 11:13:25 AM
> It's not like humans aren't already deflecting liabilityThey attempt to, sure, but it rarely works. Now, with AI, maybe it might, but that's sort of a worse outcome for the specific human involved - "If you're just an intermediary between the AI and me, WTF do I need you for?"
> or moving it to insurance agencies.
They aren't "moving" it to insurance companies, they are amortising the cost of the liability at a small extra cost.
That's a big difference.
by lelanthran
3/31/2026 at 12:14:07 PM
At some point, the risk/return calculus becomes too expensive for insurance companies.Usually thats after the premiums become too high for most people to pay.
by intended
3/31/2026 at 5:44:48 PM
The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.
by nostrademons
3/31/2026 at 8:36:40 AM
That's assuming that each fix can only introduce at most one additional defect, which is obviously untrue.by majewsky
3/30/2026 at 9:42:58 PM
Why would it converge?by Kinrany
3/31/2026 at 12:38:00 PM
Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.by pron
3/31/2026 at 12:27:53 PM
As long as we're inventing numbers, what if it's a 90% chance?What if it's a 200% chance, and every fix introduces multiple defects?
by bluefirebrand
3/30/2026 at 9:31:06 PM
I’ve had mine on a Ralph loop no problem. Just review the PR..by reactordev
3/30/2026 at 9:31:26 PM
Which still means a single person with Claude can clear a queue in a day versus a month with a traditional team.by k_roy
3/31/2026 at 5:41:00 AM
Your example must have incredible users or really trivial software.by worthless-trash
3/30/2026 at 10:26:55 PM
The fact that KiCad still has a ton of highly upvoted missing features and the fact that FreeCAD still hasn't solved the topological renumbering problem are existence proofs to the contrary.by bsder
3/30/2026 at 11:58:27 PM
Shouldn't be down voted for saying this. There are active repo's this is happening in."BuT ThE LlM iS pRoBaBlY iNtRoDuCiNg MoRe BuGs ThAn It FiXeS"
This is an absurd take.
by rybosworld
3/31/2026 at 2:23:42 AM
It probably is introducing more bugs because I think some people dont understand how bugs work.Very, very rarely is a bug a mistake. As in, something unintentional that you just fix and boom, done.
No no. Most bugs are intentional, and the bug part is some unintended side effects that is a necessary, but unforseen, consequence of the main effect. So, you can't just "fix" the bug without changing behavior, changing your API, changing garauntees, whatever.
And that's how you get the 1 month 1-liner. Writing the one line is easy. But you have to spend a month debating if you should do it, and what will happen if you do.
by array_key_first
3/31/2026 at 10:44:57 AM
So, you have already fixed all the bugs and now just cruising through life?by missingdays
3/31/2026 at 11:32:53 AM
I wonder whether people like you have actually used Claude for any length of time.I use it all day. I consider it a near-miracle. Yet I correct it multiple times daily.
by jen729w
3/30/2026 at 8:40:31 PM
The pressure to do so will only happen as a consequence of the predicted vulnerability explosion, and not before it. And it will have some cost, as you need dedicated and motivated people to conduct the vulnerability search, applying the fixes, and re-checking until it comes up empty, before each new deployment.The prediction is: Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”.
by layer8
3/30/2026 at 8:49:17 PM
I feel like the dream of static analysis was always a pipe.When the payment for vulns drops i'm wondering where the value is for hackers to run these tools anymore? The LLMs don't do the job for you, testing is still a LOT OF WORK.
by cartoonworld
3/30/2026 at 9:19:20 PM
Breaking something is easier than fixing it.by joatmon-snoo
3/30/2026 at 9:22:05 PM
People have said that for decades and it wasn't true until recently.by tptacek
3/31/2026 at 12:33:18 AM
Hmm: can you elaborate?I've never been on a security-specific team, but it's always seemed to me that triggering a bug is, for the median issue, easier than fixing it, and I mentally extend that to security issues. This holds especially true if the "bug" is a question about "what is the correct behavior?", where the "current behavior of the system" is some emergent / underspecified consequence of how different features have evolved over time.
I know this is your career, so I'm wondering what I'm missing here.
by joatmon-snoo
3/31/2026 at 12:37:16 AM
It has generally been the case that (1) finding and (2) reliably exploiting vulnerabilities is much more difficult than patching them. In fact, patching them is often so straightforward that you can kill whole bug subspecies just by sweeping the codebase for the same pattern once you see a bug. You'd do that just sort of as a matter of course, without necessarily even qualifying the bugs you're squashing are exploitable.As bugs get more complicated, that asymmetry has become less pronounced, but the complexity of the bugs (and their patches) is offset by the increased difficulty of exploiting them, which has become an art all its own.
LLMs sharply tilt that difficulty back to the defender.
by tptacek
3/31/2026 at 8:20:42 AM
In a sense, breaking a vulnerability is easier than fixing it up to be an exploit.by saagarjha
3/30/2026 at 9:32:56 PM
Specifically in software vulnerability research, you mean.Fixing vulnerable code is usually trivial.
In the physical world breaking things is usually easier.
by underdeserver
3/31/2026 at 10:09:24 AM
That's why you simply make the LLM part of the CI checks on PRs.by stavros
3/30/2026 at 9:44:34 PM
A proper fix maybe. But LLMs can easily make it no longer exploitable in most cases.by charcircuit
3/30/2026 at 8:32:23 PM
That might be one outcome, especially for large, expertly-staffed vendors who are already on top of this stuff. My real interest in what happens to the field for vulnerability researchers.by tptacek
3/30/2026 at 8:53:49 PM
Perhaps a meta evolution, they become experts at writing harnesses and prompts for discovering and patching vulnerabilities in existing code and software. My main interest is, now that we have LLMs, will the software industry move to adopting techniques like formal verification and other perhaps more lax approaches that massively increase the quality of software.by lifty
3/31/2026 at 12:00:30 PM
> Perhaps a meta evolution, they become experts at writing harnesses and promptsHarnesses, maybe, but prompts?
There's still this belief amongst AI coders that they can command a premium for development because they can write a prompt better than Bob from HR, or Sally from Accounting.
When all you're writing are prompts, your value is less than it was before., because the number of people who can write the prompt is substantially more than the number of people who could program.
by lelanthran
3/31/2026 at 1:22:52 PM
I agree with this take. Nothing changes, everything just evolves. Been happening for 60 years, will (likely) continue to happen for the next 60 years.by sputknick
3/31/2026 at 2:49:57 AM
Also, synthetic data and templates to help them discover new vulnerabilities or make agents work on things they're bad at. They differentiate with their prompts or specialist models.Also, like ForAllSecure's Mayhem, I think they can differentiate on automatic patching that's reliable and secure. Maybe test generation, too, that does full coverage. They become drive by verification and validation specialists who also fix your stuff for you.
by nickpsecurity
3/31/2026 at 12:36:51 AM
Testing exists.> formal verification
Outside of limited specific circumstances, formal verification gives you nothing that tests don't give you, and it makes development slow and iteration a chore. People know about it, and it's not used for lot of reasons.
by habinero
3/30/2026 at 8:57:00 PM
True, but I already am curious to see what happens in a multitude of fields, so this is just one more entry in that list.by stavros
3/30/2026 at 9:45:22 PM
Just wanted to point out that tptacek is the blog post's author (and a veteran security researcher).by underdeserver
3/31/2026 at 11:51:53 AM
I've worked at companies before where they have balked at spending $300 to buy me a second hand thinkpad because I really wanted to work on a Linux machine rather than a mac. I don't see them throwing $unlimited at tokens to find vulnerabilities, at least until after it's too late.by weavie
3/30/2026 at 8:49:43 PM
Attackers only have to be successful once while defenders have to be successful all the time?by htrp
3/31/2026 at 9:15:37 AM
Yes and no. Good defence is layered and an attacker needs to find a hole in each layer. Even if it is not layered intentionally a locally exploitable vulnerability gives little if you have no access to a remote system. But some asymmetry does exist.by citrin_ru
3/30/2026 at 8:41:59 PM
My sense is that the asymmetry is non-trivial issue here. In particular, a threat actor needs one working path, defenders need to close all of them. In practice, patching velocity is bounded by release cycles, QA issues / regression risk, and a potentially large number of codebases that need to be looked at.by zar1048576
3/31/2026 at 12:46:31 AM
Find-then-patch only works if you can fix the bugs quicker than you’re creating new ones.Some orgs will be able to do this, some won’t.
by cvwright
3/31/2026 at 1:07:16 AM
"Find me vulnerabilities in this PR."by stavros
3/30/2026 at 8:45:22 PM
> If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software?Probably because it will be a felony to do so. Or, the threat of a felony at least.
And this is because it is very embarrassing for companies to have society openly discussing how bad their software security is.
We sacrifice national security for the convenience of companies.
We are not allowed to test the security of systems, because that is the responsibility of companies, since they own the system. Also, companies who own the system and are responsible for its security are not liable when it is found to be insecure and they leak half the nations personal data, again.
Are you seeing how this works yet? Let's not have anything like verifiable and testable security interrupt the gravy train to the top. Nor can we expect systems to be secure all the time, be reasonable.
One might think that since we're all in this together and all our data is getting leaked twice a month, we could work together and all be on the lookout for security vulnerabilities and report them responsibly.
But no, the systems belong to companies, and they are solely responsible. But also (and very importantly) they are not responsible and especially they are not financially liable.
by Buttons840
3/30/2026 at 9:04:50 PM
>> If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software?>Probably because it will be a felony to do so. Or, the threat of a felony at least.
"my software" implies you own it (ie. your SaaS), so CFAA isn't an issue. I don't think he's implying that vigilante hackers should be hacking gmail just because they have a gmail account.
by gruez
3/31/2026 at 2:46:46 PM
Takeaway is formal software.by whateverboat
3/31/2026 at 12:21:45 AM
Because not all software gets auto-updated. Most of it does not!by woeirua
3/31/2026 at 9:42:05 AM
closed source softwaredeliberate vulnerabilities (thanks nsa)
by lukewarm707
3/31/2026 at 11:09:44 AM
Any patch you ship lands on a moving treadmill of releases and deps, with new code stapled onto old junk and old assumptions leaking into the next version. Attackers can run the same models you do, so the gap between finding and fixing bugs shrinks until your team are doing janitorial work against a machine."Perfectly secure" software is a philosophy seminar, not an outcome. You can cut the bug pool down a lot, but the tide keeps coming and the sandcastle still falls over.
by hrmtst93837
3/31/2026 at 11:16:43 AM
Any patch you ship can be vetted by the same models, so you can be sure the same models can't find a vulnerability in the attacker's hands. Then it's just a matter of fixing the old vulnerabilities.by stavros
3/31/2026 at 4:46:08 PM
"so you can be sure"Nothing is for sure with LLMs.
by g947o
3/31/2026 at 4:55:09 PM
Nothing is for sure with anything.by stavros
3/31/2026 at 8:09:04 AM
[dead]by hrmtst93837