5/7/2026 at 9:16:17 PM
Again, and this is important:A bug is a bug. A “potential vulnerability” is a bug. A vulnerability is verifiable as having security implications with a proof of concept or other substantial evidence.
Words matter. Bugs matter. It’s important to fix large amounts of bugs, just as it always has been, and has been done. Let that be impressive on its own, because it IS impressive.
Mythos didn’t write 271 PoC for vulnerabilities and demonstrate code path reachability with security implications. Mythos found 271 valid bugs. Let that be enough.
by jerrythegerbil
5/7/2026 at 9:58:18 PM
I was a bit confused by your definitions, but here's how Mozilla broke out [1] the 271, um, things:> As additional context, we apply security severity ratings from critical to low to indicate the urgency of a bug:
> * sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior, like browsing to a web page. We make no technical difference between these, but sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.
> * sec-moderate is assigned to vulnerabilities that would otherwise be rated sec-high but require unusual and complex steps from the victim.
> * sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).
> Of the 271 bugs we announced for Firefox 150: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.
Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit. And on their definitional page, they classify even sec-low as "vulnerabilities" [2].
Words are tools, that get their utility from collective meaning. I'd be interested where you recieved your semantics from and if they match up or disagree with Mozilla.
[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...
[2] https://wiki.mozilla.org/Security_Severity_Ratings/Client
by epistasis
5/7/2026 at 10:47:49 PM
I work at Mozilla; I fixed a bunch of these bugs.In general, I would say that our use of "vulnerability" lines up with what jerrythegerbil calls "potential vulnerability". (In cases with a POC, we would likely use the word "exploit".) Our goal is to keep Firefox secure. Once it's clear that a particular bug might be exploitable, it's usually not worth a lot of engineering effort to investigate further; we just fix it. We spend a little while eyeballing things for the purpose of sorting into sec-high, sec-moderate, etc, and to help triage incoming bugs, but if there's any real question, we assume the worst and move on.
So were all 271 bugs exploitable? Absolutely not. But they were all security bugs according to the normal standards that we've been applying for years.
(Partial exception: there were some bugs that might normally have been opened up, but were kept hidden because Mythos wasn't public information yet. But those bugs would have been marked sec-other, and not included in the count.)
So if you think we're guilty of inflating the number of "real" vulnerabilities found by Mythos, bear in mind that we've also been consistently inflating the baseline. The spike in the Firefox Security Fixes by Month graph is very, very real: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...
by IainIreland
5/7/2026 at 10:52:23 PM
What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?by paulvnickerson
5/7/2026 at 11:12:56 PM
I work on SpiderMonkey, so I mostly looked at the JS bugs. It was a smorgasbord of various things. Broadly speaking I'd say the most impressive bugs were TOCTOU issues, where we checked something and later acted on it, and the testcase found a clever way to invalidate the result of the check in between.If you look closely at, say, this patch, you might get a sense of what I mean (although the real cleverness is in the testcase, which we have not made public): https://hg-edge.mozilla.org/integration/autoland/rev/c29515d...
by IainIreland
5/7/2026 at 11:17:47 PM
> although the real cleverness is in the testcase, which we have not made publicWhat is the point of keeping it private? I'd bet feeding this patch to Opus and asking to look for specific TOCTOU issue fixed by the patch will make it come up with a testcase sooner or later.
by reisse
5/7/2026 at 11:57:11 PM
The same is also true of a good security researcher, and has been for a long time. The question is mostly whether it takes long enough to come up with a testcase that we've managed to ship the fix to all affected releases, and given people some time to update. (And maybe LLMs do change the calculus there! We'll have to wait and see.)by IainIreland
5/7/2026 at 11:31:35 PM
Possibly! One of the many areas that might need rethinking in the age of AI (that started in February of this year) is how long security bugs should be hidden. We live in interesting times.by mccr8
5/8/2026 at 10:19:11 AM
Given the commit is 4 weeks old, will it eventually get comments?The code before the patch does not look obviously wrong. Now, some more lines were added, but would you now say it now looks less obviously wrong, or more obviously correct?
It seems that the invariants needed here are either in some person's heads, or in some document that is not referenced.
Reading the code for the first time, the immediate question is: "What other lines might be missing? How can I figure?"
If the "obviously correct" level of the code does not increase for a human reviewer, how is it ensured that a similar problem will not arise in the future? Or do we need more LLM to tell us which other lines need to be added?
by nh2
5/8/2026 at 3:10:11 PM
Yeah, the test with the patch also adds comments. The human reviewer had extra context available.I did get Opus to do an audit for similar problems elsewhere, to supplement the investigations that we were already doing by hand. It initially thought it found something, but when asked to produce a testcase, it thought for 20 minutes and admitted defeat. I suspect that the difference between Opus and Mythos is in small edges like this: if Mythos is smart enough to spot why Opus's discovery didn't work a little bit faster, and it can waste less time chasing down red herrings, then it's more likely to find a real bug within the limits of a context window. It's not that Opus completely lacks some capability, it's that it has trouble chaining all the pieces together consistently.
by IainIreland
5/9/2026 at 5:41:46 PM
Can't remember when I last heard TOCTOU term being used. Nice :)by seebeen
5/7/2026 at 11:18:30 PM
Very cool, thank you.by paulvnickerson
5/7/2026 at 11:29:32 PM
I'd say it leans towards memory corruption kinds of issues, as those are easiest to pass the validator, thanks to AddressSanitizer. I think there's a lot of potential for making the validator more sophisticated. Like maybe you add a JS function that will only crash when run in the parent process and have a validator that checks for that specific crash, as a way for the LLM to "prove" that it managed to run arbitrary JS in the parent. Would that turn up subtler issues? Maybe.by mccr8
5/8/2026 at 1:24:27 PM
You may not be able to comment, but do you feel like Mythos is accomplishing anything that couldn't have already been done with Opus and the right prompting?I've assumed I could send an agent using a publicly available model bug hunting in a codebase like this and get tons of results, assuming I wanted to burn the tokens, so it's really unclear to me whether the Mythos hype is justified or if it's just an easy button (and subsidized tokens?) to do what is already possible.
by JeremyNT
5/8/2026 at 3:38:33 PM
I never got direct access to Mythos, so all I know is what I've seen from the quality of the bugs being produced. I also haven't been involved at the prompting end.So the best answer I can give is: I dunno, maybe it's possible to find bugs like this using Opus, but if so, where are they? Did nobody think to try "please find the bug in this code" pre-Mythos? I've done enough auditing with Opus to be convinced that it can be a good assistant to somebody who already knows what they're doing, but in practice the big wave of AI-discovered bugs started with Mythos.
I'm sure lots of people have assumed they could send a publicly available model bug hunting and find things. I have not noticed a huge amount of success. We've had some very nice correctness bugs reported, but skimming through the list of security bugs I've fixed recently, the AI-related ones all seem to be Mythos.
My best guess is that Mythos is just enough better along just enough axes that its hit rate on finding potential bugs and filtering out the real ones from the hallucinations is good enough to matter. Like, there's no obvious qualitative difference between 3.6kg of uranium-232 and 3.8 kg of uranium-232, just a small quantitative increase. But if you form both of them into spheres, only one of them has reached critical mass. Can you do something clever to reach critical mass with 3.6kg of uranium? Maybe! But needing to do something clever is a non-trivial barrier in itself.
by IainIreland
5/8/2026 at 4:47:21 PM
I did some experiments and Opus seemed pretty able to wire up a harness to find bugs and write PoC + patch for each. It's still a lot of work to get fixes upstreamed from outside so I think even if outsiders have better tools (Mythos etc) it won't change the report rate much, people may find more bugs but they won't report them. I suspect that's part of the calculation of the phased rollout for Mythos, finding bugs is already not the bottleneck.by hedgehog
5/8/2026 at 5:52:21 PM
> It's still a lot of work to get fixes upstreamed from outsideI'm going to disagree in the specific case of Firefox. First, although it has diverged a long way from its roots, Mozilla still has the community project ideal in its DNA. Enough, at least, that I stumbled while reading the clause "from outside" -- if you're finding and reporting actual relevant security bugs, you're already on the inside. SpiderMonkey in particular still has a good amount of code being written and even maintained by non-employees. (Examples: Temporal and LoongArch64 JIT support).
Second, the bug bounty program still exists[0] and is being used. If someone were sitting on a pile of AI-discovered exploits, then it has monetary value which is rapidly draining away the longer they aren't reported.[1] That's incentive to put in the work to report them properly.
Third, I agree that finding bugs is likely not the bottleneck. Validating them is. With previous models, the false positive rate was too high so they required too much work to whittle down to the valid ones. A PoC is a very strong signal that a bug is valid, and that's where I just don't believe you: without a really good harness, I don't think Opus was good enough to find very many bugs with PoCs. It could find some, just not very many.[2]
[0] For now. It remains to be seen how it will adapt to the AI age. For the moment, it hasn't been severely nerfed like Google's.
[1] One could make the argument that people who are inexpert enough to only be able to poke an AI to find bugs are also the people more likely to sell them on the black market rather than disclosing them. It seems plausible. Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos.
[2] Also note that I personally, as a SpiderMonkey developer, don't find a huge amount of value in the AI-generated patches that accompany these bug reports. Sometimes they're useful to better illustrate the problem, especially since the AI's problem analysis is usually subtly wrong in important ways. They can be a decent starting point for a real patch. But I'll still need to go through my own process of figuring out what the right fix is, even in the handful of cases where I end up with the same thing the AI did.
by sfink
5/10/2026 at 8:55:36 AM
Hi! First of all, thanks for your incredibly thoughtful and enlightening answers, and most of all for helping keep Firefox alive.You said:
> Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos.
How much of this could be just due to focus? i.e. prior to the partnership with Anthropic to test Mythos Preview, has there ever been a similarly focused project, specifically trying to find security bugs in Firefox?
by cassianoleal
5/11/2026 at 7:36:39 PM
That's a fair point, given the restrictions on Mythos and now Opus 4.7. I'm kind of comparing apples and oranges.There are two things mixed together here. There is targeted scanning that was done by both Anthropic and Mozilla employees, using first Opus and then Mythos. Then there are other non-employee security researchers using AI to find and file bugs, motivated mostly by bug bounties.
The researchers were filing a steady trickle of bugs presumably using Opus 4.6. (Or rather, I saw a steady trickle after other people triaged them; I imagine the incoming stream was a lot busier.) My impression is that those have mostly dried up now. That could be the bias in my sample (I only see a slice of incoming bugs, so my anecdata aren't that strong), or a result of the restrictions added to the generally available models, or a result of there being less to find now that we've fixed so many of the issues found by company-backed bughunts. Or a combination of all three.
I guess my opinion is mostly driven by the difference in the quality and magnitude of bugs coming in from the company-backed scans pre- and post-Mythos. With Opus, there was an initial rush, but then it mostly died down. (For our group. For other groups, it was a series of waves that they never quite made it over before the next one came crashing in.) With Mythos, it was a larger wave and the quality of the bugs was higher. Two quantitative differences that ended up feeling like a qualitative change. So it's my underinformed personal opinion, but to me it feels like: yes, you could continue to find more bugs using a roughly Opus 4.6-strength model, but not that many and not cheaply, and the success rate is going to depend a lot on the harness. In comparison, I don't think we've seen the end of the Mythos wave, and my sense is that Mythos requires much less in the way of a harness.
It feels like the bitter lesson is playing itself out again, which I kinda hate because I want human ingenuity and cleverness to make an important difference, even after the next model has seen what the humans are coming up with.
by sfink
5/11/2026 at 8:04:52 PM
That makes sense, thanks for taking the time to write this up!by cassianoleal
5/8/2026 at 5:54:15 PM
We have a bounty program. If you can find security bugs in Firefox, please let us pay you for them. You don't need to provide a fix; a testcase that crashes in an interesting way is often enough to qualify.by IainIreland
5/8/2026 at 5:18:35 PM
> I suspect that's part of the calculation of the phased rollout for Mythos, finding bugs is already not the bottleneck.I was wondering this too. By working directly with tech companies and (one assumes) subsidizing tokens, they're empowering the people on the inside who absolutely want to have the bugs fixed.
Who outside of Mozilla is going to pay and spend the effort to find Firefox bugs? Sure some hobbyists and contributors might, but they don't have the institutional knowledge of the codebase which can help guide an agent prompts, nor do they have strong incentives to try and report them, nor do they necessarily have the time to craft good bug reports that stand out from the slop reports.
My assumption would be that most people working to discover bugs this way in Firefox are interested in using them rather than getting them fixed, so maintainers wouldn't necessarily even know the degree to which it was already happening.
by JeremyNT
5/8/2026 at 7:51:14 PM
The incentive is that Mozilla will pay you thousands of dollars if you find a security bug: https://www.mozilla.org/en-US/security/client-bug-bounty/We have many outside contributors who have successfully submitted security bugs and received payments.
by mccr8
5/7/2026 at 10:51:59 PM
I'm not a security dev or researcher or anything, but as an outsider my understanding matches how Mozilla uses the terms. Though words used by specialists and the general public can offer differ...by epistasis
5/7/2026 at 11:34:52 PM
Can you elaborate why those bugs weren't found by e.g. fuzzing in the past?I'm genuinely curious what "types" of implementation mistakes these were, like whether e.g. it was library usage bugs, state management bugs, control flow bugs etc.
Would love to see a writeup about these findings, maybe Mythos hinted us towards that better fuzzing tools are needed?
by cookiengineer
5/8/2026 at 12:21:49 AM
If I had to guess, I'd say that AI is better at finding TOCTOU bugs than fuzzing because it starts by looking at the code and trying to find problems with it, which naturally leads it to experiment with questions like "is there any way to make this assumption false?", whereas fuzzing is more brute force. Fuzzing can explore way more possible states, but AI is better at picking good ones.In this particular sense, AI tends to find bugs that are closer to what we'd see from a human researcher reading the code. Fuzz bugs are often more "here's a seemingly innocuous sequence of statements that randomly happen to collide three corner cases in an unexpected way".
Outside of SpiderMonkey, my understanding is that many of the best vulnerabilities were in code that is difficult to fuzz effectively for whatever reason.
by IainIreland
5/8/2026 at 12:12:48 AM
Fuzzing isn't good at things like dealing with code behind a CRC check, whereas the audit based approach using an LLMs can see the sketchy code, then calculate the CRC itself to come up with a test case. I think you end up having to write custom fuzzing harnesses to get at the vulnerable parts of the code. (This is an example from a talk by somebody at Anthropic.)That being said, I think there's a lot of potential for synergy here: if LLMs make writing code easier, that includes fuzzers, so maybe fuzzers will also end up finding a lot more bugs. I saw somebody on Twitter say they used an LLM to write a fuzzer for Chrome and found a number of security bugs that they reported.
by mccr8
5/8/2026 at 3:31:33 AM
How about this: a "vulnerability" is a "vulnerability", but after it was identified and verified to cause problem, that's when it should be called a "bug", because it could make the software do unwanted things.by nirui
5/8/2026 at 5:36:32 AM
At Mozilla, everything is called a bug. It's what other systems call an "issue". So it's too late for your terminology at Mozilla. (Example: I have a bug to improve the HTML output of my static analysis tool. There is nothing incorrect or flawed about the current output.)At Mozilla, but not everywhere: exploits are a subset of vulnerabilities are a subset of bugs.
by sfink
5/8/2026 at 10:59:13 AM
Fwiw i think this is right. A bug is anything that doesn't do what you want it to do, and nobody should want a vulnerability in their softwareby freedomben
5/8/2026 at 4:10:15 PM
That's why they created Bugzilla :) (https://www.bugzilla.org/blog/2023/08/26/bugzilla-celebrates...)by darkwater
5/8/2026 at 9:20:06 AM
When I worked at Mozilla, _everything_ was called a bug, whether it was a software issue, a problem in the office or some paperwork missing.Much as GitHub calls everything an "issue" and GitLab a "work item".
by Yoric
5/10/2026 at 9:00:21 AM
How is a vulnerability not a bug? Surely you didn’t design the software to enable exploits on your users, right?Vulnerabilities are a special class of bugs. One that’s generally so important to fix that they get a special name and more attention.
That doesn’t make them less of a bug. It makes them more of one.
by cassianoleal
5/7/2026 at 10:31:21 PM
Presumably there are (implicit?) "sec-none" things, like [a] from the recently released 150.0.2 [b] which makes absolutely zero mention about "Security Impact" or "Severity" in the bug report, unlike [c], which is listed in the Mozilla weblog post [2].Security things are mentioned in the Release Notes [b] pointing to a completely different document [d].
Perhaps sometimes a bug is 'just' a bug, and not a vulnerability.
[a] https://bugzilla.mozilla.org/show_bug.cgi?id=2034980 ; "Can't highlight image scans in Firefox 150+"
[b] https://www.firefox.com/en-CA/firefox/150.0.2/releasenotes/
[c] https://bugzilla.mozilla.org/show_bug.cgi?id=2024918
[d] https://www.mozilla.org/en-US/security/advisories/mfsa2026-4...
by throw0101c
5/7/2026 at 10:30:02 PM
> Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit.That’s not evident in what you pastedat all.
What you pasted says
> sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior […] We make no technical difference between these […] sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.
> sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).
From this one infers that the "180 were sec-high" bugs found are actually exploitsble but known to have been found in the wild, and are NOT mere annoying bugs.
The difference between 180 and 270 does nothing to deflate the signicance, or lack there of, of the implication re: Mythos.
by Gregaros
5/7/2026 at 10:44:18 PM
Yes, it is not in what I pasted, as I said, "even though they say right below". If you don't believe me then click on either of the links.by epistasis
5/7/2026 at 11:19:40 PM
Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).For us this is substantial enough evidence to consider it a security vulnerability at that point, unless shown otherwise and it has always been this way (also for fuzzing bugs).
by mozdeco
5/8/2026 at 4:13:17 AM
> Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?
> For us this is substantial enough evidence to consider it a security vulnerability at that point
Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.
[1] https://www.mozilla.org/en-US/security/advisories/mfsa2026-3...
by ZrArm
5/8/2026 at 12:36:47 PM
> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?This is just the standard sentence we've been using for years. It has nothing to do with Mythos and for Mythos, almost all bugs show evidence of memory corruption (we do have a handful of bugs in JS IPC / JS Actors, one is in the blog post).
> Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.
Yes but if we have a choice between writing exploits and scanning more source, potentially finding more bugs, then of course we prioritize the latter.
by mozdeco
5/8/2026 at 5:56:00 AM
> But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?I'm guessing a bit, but for example: out of bounds reads are not memory corruption. Assertion failures in debug builds are also usually not memory corruption, and I'd guess that many of these bugs were found through assertions. (Some parts of Firefox like the SpiderMonkey JS engine make heavy use of assertions, and that's the biggest signal used for defect validation. An assertion firing is almost always treated as a real and serious problem. Though with our harness, Opus and Mythos try to come up with an exploit PoC anyway.)
by sfink
5/8/2026 at 11:27:32 AM
It makes sense, thanks, even though that wording is still somewhat confusing.by ZrArm
5/8/2026 at 3:14:27 AM
Is that number of crashing bugs with PoC available/written down anywhere?by jerrythegerbil
5/8/2026 at 8:36:44 PM
As described in our blog posts, our harness/pipeline only looks for crashes so all of the bugs resulting from that do have PoCs. There is a smaller number of bugs found by manual auditing that didn't have PoCs but I'd say easily more than 90% of all of the bugs we are talking about had a PoC.by mozdeco
5/8/2026 at 10:04:02 AM
It may be worth noting that Claude can and will (if it believes you own the code, at least) produce PoC exploits for exploitable bugs that it finds.My only source for this is personal experience, and no, I can't share any evidence of it.
by jeffparsons
5/8/2026 at 11:02:26 AM
Are you certified for high risk cyber uses? If so then you're correct. If not, then it does not match my experienceby freedomben
5/8/2026 at 12:59:09 PM
The word “exploit” may be doing a lot of work here. In my experience Opus 4.6 is perfectly happy to provide test cases that trigger ASAN, even without the super secret squirrel security access.But if you ask it to get you a shell it’ll probably tell you to get lost.
by cvwright
5/10/2026 at 5:17:17 AM
I don't have any special certification or arrangement with Anthropic; this is vanilla Opus 4.x via Claude Code.by jeffparsons
5/8/2026 at 4:28:22 PM
I'm not and I'm doing it right now.by staticassertion
5/8/2026 at 12:29:33 AM
This isn’t true anywhere people have to make decisions about what to work on first.by browningstreet
5/7/2026 at 11:42:39 PM
> Mythos didn’t write 271 PoC for vulnerabilitiesI think the word you're looking for is exploit?
by dataflow