New serious vulnerabilities spiked around release of Claude Mythos Preview

7/4/2026 at 12:16:07 PM

One of the major differences between Amodei’s and Hagseth’s views is that Hagseth said that in their world they don’t distinguish between “defensive” and “offensive” capabilities.

In other words, a weapons missle defense system is equivalent to an attack one.

I think that applying this thinking to software is a mistake. A lot of commercial software uses open source libraries under the hood, and and while the large corporations might have access to Mythos/Fable/gpt 5.6, the open source library maintainers typically don’t. That leaves them vulnerable to foreign adversaries who do have access to AI models. Attackers don’t need Mythos-level capability then, they just need to outperform whatever the maintainers are using.

Which means that Anthropic’s decision to restrict security research on even Sonnet makes that gap (and thus an attackers opportunity) even larger.

I say this as a coder who wants to release some of my internal libraries to open source. The risk now is that I open up my own products (which use those libraries) to vulnerability scanners while not having those kinds of detection methods myself. This, it’s safer to not release and keep internal than to risk increasing my own attack risk.

Hopefully we will come to see that software is not equivalent to missle defense — writing safe code is different than attacking others’.

by denverllc

7/4/2026 at 3:20:06 PM

OpenAI gives access to cyber models for open source maintainers

https://openai.com/index/patch-the-planet/

by pratikel

7/4/2026 at 5:07:31 PM

Both companies have cyber defense programs, but both 5.6 and Mythos are restricted by the government unfortunately.

by solenoid0937

7/4/2026 at 2:46:51 PM

Wouldn't open source enable review from people with access to the scanners prior to release?

Seems like there is a fair chance that it will mostly be an actual spike, where's a bunch of existing vulnerabilities get cleaned up and then published software mostly has less vulnerabilities going forward.

by maxerickson

7/4/2026 at 6:57:58 PM

Agreed. But this depends on (at least) 1 condition: that no new bugs are introduced.

FLOSS projects keep moving forward, and it seems some project's maintainers are being swamped by PRs (some good, some bad).

Whatever allows random 3rd party to 'strip-mine' existing codebases for bugs, should also be applied to new code.

by RetroTechie

7/4/2026 at 5:09:20 PM

How much money do you think a nation state has to spend on exploiting an OSS library? More or less than the owner of the OSS library? There's your answer.

Furthermore, of course Glasswing participants are scanning their dependencies as well. Why would you think they aren't!?

by solenoid0937

7/4/2026 at 1:25:06 PM

If we take the noise about Mythos' capabilities as read, then releasing it freely into the world could result in chaos, as attackers find myriad new vulnerabilities and use them, and code owners frantically hunt for them and fix any that are exploited. (Noting, of course, how legendarily quick and agile large corporations aren't, compared to motivated individuals or small groups.). Eventually, given unfettered access to Mythos and sufficient time, things would settle down again once everything was patched, but who knows what would happen in the process?

So I suspect this has less to do with the underlying ethics or logic, and more to do with Anthropic not wanting to be held responsible for unleashing a potential period of chaos onto the world.

Of course, if someone has access to a tool that can find vulnerabilities in code, the process is identical whether the ultimate intent is to fix or exploit them (which may be Hegseth's underlying logic?). So to avoid this 'world chaos' scenario, Anthropic needs to somehow restrict Mythos access, avoiding bad players. And the only heuristics available at scale are either task-based assessment by AI (with downgrading of anything marginally risky to older models) or selection of trusted organisations by humans.

(By the by, to your point, it would also make sense to expand Glasswing to open source maintainers, at scale. I can't tell to what extent this has been part of that project?)

by mft_

7/4/2026 at 6:46:15 PM

Given the choice of 1) just release it to all and hope for the best, and 2) phased-rollout roughly ordered by criticalness, I'm really not finding the advocates of 1 very persuasive.

by tiahura

7/4/2026 at 3:54:24 PM

You're right on that, https://www.hurstpublishers.com/book/full-stack-spies/ goes over it in much more lucid detail.

Hegseth is to blindisded by macho-ism to value anything that requires patience and planning (see iran) If Fable is able to cheaply (ie less than $40k) find serious CVEs in common software, then it costs america much more to defend against it. especially as they are keeping the price of zero days artificially high.

by KaiserPro

7/4/2026 at 6:06:57 AM

I do maintain dozens of C/C++/Perl projects. I got massive amounts of new good vulnerability reports, more than with the latest fuzzing waves. Fuzzing is still the majority overall, but Opus dominates now. Haven't got any Mythos/Fable vuln yet. And with the help of Sonnet/DeepSeek I can finally get around and weed out all the still existing fuzzing bugs. It has nothing to do with Mythos for me, just people getting Anthropic Max accounts.

And CVE's: People actually do that now, which before they didn't. Github allowing it now, certainly does help massively. This is a good thing

by rurban

7/4/2026 at 5:51:32 PM

Are CVE really a good thing for open source projects?

No doubt is it a good thing to have issues reported and fixed, but CVE feels a bit like blackmailing maintainers - either you fix the issue or we get your project flagged with "security scanners".

I guess, my distaste mostly originates from randomly assigned high CVE numbers that don't reflect the actual threat. And the fact that it gives the companies which use the code "AS IS" an imaginary stick to hit open source maintainers, until they fix the issues for the company (for free of course).

by eXpl0it3r

7/4/2026 at 7:17:30 PM

It's good. It gives the maintainers the possibility to update their packages. And if a CVE is unfixed for months it reflects on the maintainance. This usually only happens to closed source packages.

by rurban

7/4/2026 at 2:26:38 PM

On my hobby coding with C++ I also cross check with CoPilot, alongside the usual VS analysis tools.

Which was certainly an improvement, given that Github is in no hurry to add modules support to CodeQL.

by pjmlp

7/4/2026 at 3:11:32 AM

This is hardly news? We've known for months that a flood of AI-assisted vulnerabilities was coming; I posted on Twitter in March calling 2026 the year of a million CVEs: https://x.com/i/status/2035045573116789002

by cperciva

7/4/2026 at 4:22:26 AM

In pretty much every single HN post on this topic, there are a number of commenters claiming it’s false. Continued quantifiable data like this seems very important at hopefully resolving the ongoing disagreement about the facts.

by no-name-here

7/4/2026 at 5:24:21 PM

Yes, there’s been a very popular narrative that Mythos’ abilities are just marketing fluff. I think it’s clear that there’s a real capability here, even if Anthropic’s communications have been heavily influenced by PR concerns.

by moomin

7/4/2026 at 11:28:04 AM

My read of the zeitgeist on HN is that these new LLMs bring with them a torrent of false or useless security reports, that whatever may be true simply drowns.

The end result is both that there are more critical CVE and that there aren’t.

by wodenokoto

7/4/2026 at 5:31:06 PM

In pretty much every single HN post ~~on this topic~~, there are a number of commenters that are probably bots. Anyways it makes sense from a theoretical standpoint that LLMs should be able to find flaws in code better/faster than humans eventually, and its reasonable to think that time has come

by sometimelurker

7/4/2026 at 3:37:27 PM

AI has driven many people into denial. It's excruciating to watch otherwise smart individuals embrace terrible thinking, over and over and over.

by qarl2

7/4/2026 at 5:43:45 AM

I've seen plenty of people saying "Mythos isn't all that exceptional, lots of LLMs can find security vulnerabilities" -- and indeed there is some evidence for that; it sounds like Anthropic was taken somewhat by surprise at how easily a simple prompt managed to get Mythos to deliver exploits and didn't distinguish immediately between the effectiveness of Mythos and the effectiveness of the prompt.

But the claim of "LLMs aren't making a difference in vulnerability discovery" has been laughable to anyone who has been reading security advisories for the past 3 months. Just look at the Credits lines.

by cperciva

7/4/2026 at 6:21:27 AM

I thought the point was not that Mythos finds more vulnerabilities, but that it can exploit them much more successfully. I thought the report showed it didn’t find much more than Opus 4.8. (Or did I misread?)

by wrs

7/4/2026 at 7:13:59 AM

If you look at public benchmarks like ExploitBench [1], then you'll see this is mostly a question of token budget. Once you give it sufficient tokens to burn, GPT 5.5 is roughly as good as Mythos when it comes to finding bugs and building exploits. With some clever auto-prompting to clear stalls, it even beats the base Mythos version. So Mythos' "magic" is not in the model, but in the harness and compute env. That's probably also why they never released it, because Anthropic already struggled heavily to make Opus available to the general public. Releasing Mythos publicly may well be technically impossible for them due to compute constraints.

[1] https://exploitbench.ai

by sigmoid10

7/4/2026 at 12:21:47 PM

I still have to see a single glibc bug that truly matters. I don't have illusions about our code quality, so there must be something to find.

We got many high-quality bug reports, some of them with a security aspect to them. Several of them received CVSSv3.1 scores of around 9.8 from the rating agencies, but these high numbers are misleading. (Vulnerability scoring is hard, and it's pretty much impossible for a library without reference to an application that uses the library.) Looking beyond the numbers, everything reported this year (and late in last year) was pretty harmless so far.

Does this mean LLMs are making a difference? For upstream developers, definitely. For end users? Not that much yet.

Maybe the picture changes once the organizations sitting on the good findings figure out how to disclose them to the relevant upstream projects. When I read the announcement of Project Glasswing, I immediately thought that this was going to be the hardest part.

by fweimer

7/4/2026 at 8:52:35 AM

[dead]

by zuck_torrenter

7/3/2026 at 10:59:15 PM

How are these reports verified to be valid? If there are too many some could be hallucinations too.

by hoppp

7/4/2026 at 12:04:00 AM

We (Project Glasswing users) follow a proof-of-concept approach. We create the exploit and verify that it behaves as the AI claims. Given our experience as security engineers (many of us with 10+ YoE) we don’t simply report every critical bug Mythos claims to have found. We verify each one carefully.

At least, that’s what most of the high-visibility users in Project Glasswing are doing.

There are bad apples everywhere, and this initiative is no exception.

If it makes you feel any better, many of us regularly meet to stay calibrated and hold each other accountable, so I’m confident in the quality of the work produced by this particular group of employees across some of the partner companies mentioned in the article.

That said, I know several people who blindly report everything Mythos finds, which is foolish, especially since the harness is a critical part of the project's quality metrics. Some of the harnesses I’ve tested are quite weak, which leads to poor results.

For example, yesterday morning I was pulled into an ad hoc meeting where a CVP was grilling me about several supposedly critical bugs that my team had reported against one of the core components of iCloud. I was genuinely surprised because we’re very strict about validation. We often even downgrade the severity of bugs when our harness can’t prove what Mythos found. After reading the reports, I realized they weren’t ours. They came from another team that had recently been given access to Mythos. They built their own harness and were using different vulnerability criteria. Fortunately, they had only started earlier this week, so I was able to stop that work.

That incident showed that not everyone involved in Project Glasswing follows the same standards. Most people do their best, but priorities differ, so it’s expected that you’ll find a few bad apples.

I wish AI labs would stop the theatrics and release their models without restrictions, but I also recognize that’s not the world we live in. For every person who wants to use these technologies for good, there are many others who would use them for harm.

In any case, while I agree that some experiments contain genuine noise, the CVE count is real.

by guessmyname

7/4/2026 at 12:00:30 PM

> Some of the harnesses I’ve tested are quite weak, which leads to poor results.

So in your opinion, what would be the best off the shelf options? And secondly, how much better you’d say a purpose built one is compared to a general purpose one with a good system prompt?

by KronisLV

7/4/2026 at 2:22:48 AM

>We (Project Glasswing users) follow a proof-of-concept approach. We create the exploit and verify that it behaves as the AI claims. Given our experience as security engineers (many of us with 10+ YoE) we don’t simply report every critical bug Mythos claims to have found. We verify each one carefully.

>That incident showed that not everyone involved in Project Glasswing follows the same standards.

by IAmGraydon

7/4/2026 at 1:09:09 AM

[dead]

by NomDePlum

7/4/2026 at 3:45:32 AM

Its very hard to understand what you're saying with the comment - like you have 10+ years of experience and you verify each bug because you know Mythos can provide fake positives. But other teams (which also should have people equivalent to your skill and experience level) suck at it so much that CVP level workers are having to spend time on their fake reports. Then you say Anthropic should stop theater. Then you say the cve count is real.

It genuinely felt like the aladin scene in The Dictator reading this comment.

by altmanaltman

7/4/2026 at 5:11:07 AM

I didn’t claim to have 10+ YoE; I said that most of the people in Project Glasswing are security researchers with 10+ YoE (avg).

> Its very hard to understand what you're saying with the comment

Yes, fair enough. I’m simply trying to shed some light on what goes on behind the scenes without disclosing too much information to avoid breaching the NDA(s) that all Project Glasswing users have signed. There’s a lot of speculation about the usefulness of Mythos as a security tool, so much so that even the US government got involved. Honestly, it’s so absurd that I can’t even express it in words. I thought that sharing a bit about how frustrating it is to work within this project, trying to secure software that literally millions of people around the planet use on a daily basis, while virtually everyone outside of it criticizes every move you make, would be helpful.

Many people I work with recognize the power of Mythos, just like any other model with a similar number of parameters, but most of the people I interact with agree that it’s not the ultimate panacea. I believe that it’s just vocal minorities scaring everyone into thinking that the model is some kind of cybernetic weapon.

by guessmyname

7/4/2026 at 5:50:41 PM

Don’t worry, some of us remember Y2K and a) how much we fixed b) how much went wrong on the day and c) getting told it was a waste of time later.

And I didn’t even have to deal with a jumpy national government.

by moomin

7/4/2026 at 5:32:39 AM

[flagged]

by hatefulheart

7/4/2026 at 1:11:22 PM

I'm on HN to read comments. This is a social forum. Insight and opinions are the value proposition.

by insanitybit

7/4/2026 at 10:05:19 AM

I care.

HN has always been a place where people get to learn and understand things from viewpoints and domains they don’t work in.

It seems that HN has more people who want to just build stuff, than people who have to fix security issues. Discussing security is itself a challenge, because of NDAs.

I don’t think any sane adult assumes that sweeping stuff under the rug means the problem has gone away.

by intended

7/4/2026 at 11:20:12 AM

I had no trouble understanding that the quality of operators is as important as the quality of the model and the harness. new operators received access to the tool and didn't follow the operational guidelines. happens everywhere, all the time, with predictable consequences; meanwhile experienced operators who follow the manuals get good results. no idea why you are surprised at anything.

by baq

7/4/2026 at 11:45:42 AM

"Please save us from ourselves daddy Anthropic - how will we survive without you and your incredible safety standards.

Wait, you guys had a RCE in Claude Code for nearly a year and didn't even release a disclosure about it and secretly patched it and swept it under the rug?

Well... It's okay, I still trust you."

by nullbio

7/4/2026 at 1:54:05 PM

Show your work so others can reproduce it.

Or it functionally does not exist.

(No, long hashes with an equally mythic promise of reproducibility don’t count)

by cmxch

7/4/2026 at 11:42:00 AM

This sounds like pure propaganda. Are people actually buying this?

by nullbio

7/3/2026 at 11:04:58 PM

The best case scenario for AI companies is, people receive those bug reports, look at the model that produced it and not even look at the details, just apply the fix mindlessly

This gives Anthropic a staggering amount of power. Oh it came from Mythos? We will just lose time trying to analyze it, better apply the fix ASAP

by nextaccountic

7/3/2026 at 11:08:41 PM

> The best case scenario for AI companies is, people receive those bug reports, look at the model that produced it and not even look at the details, just apply the fix mindlessly

Do people maintaining serious software do this, though?

by stingraycharles

7/4/2026 at 1:11:40 AM

The problem is that serious software is drowning in AI vulnerability reports. There is not enough manpower to analyze them properly. And if you ignore the reports (like curl is doing in their 1-month vacation), malicious actors will just exploit them. At some point it's inevitable to just rubber stamp whatever is coming from AI.

The actual, underlying problem is that software is buggy and current programming languages aren't fit for writing reliable software. There's a wide gap between the state of art in formal verification, and what is actually practiced in the industry. It's because of this general unreliability that AI has a large supply of vulnerabilities to find. The situation will only get better if software becomes reliable and written in solid foundations.

My guess is that AI will be even more useful to verify software (something like, write Lean or Coq proofs that the software is not vulnerable, things like that), rather than finding vulnerabilities piecemeal but still letting software be written in unsuitable languages, with no formal verification to prevent bugs from sneaking through.

by nextaccountic

7/4/2026 at 12:32:15 PM

We have plenty of functions that convert one byte array to another byte array. Both arrays have specified bounds. The functions are total (an error return indicates if the input or output arrays are incomplete). Most of them do not even have state that is preserved between calls. Complete source code is available in the same build for all the functions they call.

In theory, this should be very straightforward to prove correct with many of the current tools. In practice, no one has shown us how to do it. We could even rewrite the code from a macro/#include maze to proper function calls if that's a prerequisite for analysis. At this point, I would even take a one-off analysis.

by fweimer

7/4/2026 at 6:58:48 AM

That gap explains much of the spike. Companies who never used any scanning tools on much of their codebase are suddenly having that gap closed.

by cbzbc

7/4/2026 at 7:42:46 AM

> At some point it's inevitable to just rubber stamp whatever is coming from AI.

To make it worse? AI and even Fable can make things +50% and then -50% in different places. You can trade 1 bug for another.

So just "rubber stamp" doesn't make it better.

by re-thc

7/4/2026 at 12:19:11 AM

Define 'serious'. There is a lot of software in serious places written by very unserious people.

by pixl97

7/4/2026 at 2:08:17 PM

TBH, I’d reject Mythos or similar reports and require full reproduction on a publicly available model before considering them valid.

by cmxch

7/4/2026 at 11:48:42 AM

Currently it looks like the opposite is happening haha, "oh it came from AI, let's discard it ASAP" is the trend in open source

by realusername

7/4/2026 at 7:18:03 PM

And if it’s Mythos or Claude derived (a strong hint for Mythos), discard it faster.

by cmxch

7/3/2026 at 10:51:01 PM

I predict once the responsible disclosure period is up we will see a lot more

by solenoid0937

7/4/2026 at 4:14:34 PM

Maybe a bit of both Mythos helping find bugs and engineers relying on AI shipping more bugs. Both can be true.

by Aaron_NW

7/4/2026 at 4:53:38 AM

So basically there are two plausible explanations:

1. Someone with early access to Mythos leaked it to the bad guys.

2. Cybercriminals are getting enough mileage out of alternatives to Mythos to create exploits far more quickly, even though they don't have access to Mythos.

My own guess is that it's a combination of #2 plus vibe-coding degrading software quality at multiple layers, open the door to sophisticated exploits, but I have no insider access to Mythos so am just guessing. Maybe someone with Mythos access might say why they think this vulnerability spike happened when it did.

by simonreiff

7/4/2026 at 12:10:29 PM

I think it's rather this:

3. People were already sitting on vulnerability reports from their own tools and threw them over the wall.

They were worried about getting scooped. They had to consider Mythos' alleged capabilities as a tool, and Project Glasswing potentially establishing a well-run disclosure and remediation process. Both could devalue preexisting results.

by fweimer

7/4/2026 at 5:13:45 AM

Bad guys don't report vulns, they use them.

by prmoustache

7/4/2026 at 4:59:36 AM

I might be missing something here, but why do you assume this spike in CVEs is from bad guys? I would assume it's at least largely good guys finding and reporting vulns, not based on in-the-wild exploitation by bad guys.

by PlasmaPower

7/4/2026 at 5:00:21 AM

Disclosure of a vulnerability doesnt mean a bad guy found it.

by asp_hornet

7/4/2026 at 11:36:50 AM

I had to cut the "disclosure" in the title from the HN submission because of the character limit...

by cubefox

7/4/2026 at 11:36:31 AM

those spikes in march and june? war with iran. interesting...

by high_byte

7/4/2026 at 2:52:38 AM

…are we really drawing conclusions on this starting at April? When it was released in June?

by Robdel12

7/4/2026 at 2:55:43 AM

Mythos is from April, it was just limited to a small number of organizations.

by andai

7/4/2026 at 10:48:26 AM

It was announced in April, but it was leaked in March (CMS bug) at which point external partners were already using it, and the most common rumored date for training competition is 2026-02-07 (I think Feb is likely, but that specific date is just rumor).

by jefftk

7/4/2026 at 11:29:55 AM

> the first early version of Claude Mythos Preview was made available for internal use on February 24. [...] Based on these findings, we decided to release the model to a small number of partners to prioritize its use for cyber defense.

https://www-cdn.anthropic.com/7624816413e9b4d2e3ba620c5a5e09... (pg. 13)

by Maxious

7/4/2026 at 10:07:37 AM

Glass wing was announced April 7th.

by intended

7/4/2026 at 4:31:15 AM

Not really special, which was the point, its a general model. This is really good marketing as all other LLMs are able to do the same work.

by downrightmike

7/4/2026 at 4:42:08 AM

Can we learn something from these vulnerabilities? New categories of attacks and corresponding protections?

by eternauta3k

7/4/2026 at 11:39:05 AM

Do you know what else spiked? Vulnerability patches.

It's almost like... Finding bugs is a good thing.

by nullbio

7/4/2026 at 1:44:42 PM

How many are valid and reproducible ones and how many are just mythical unicorns?

by cmxch

7/4/2026 at 6:03:45 AM

This is good. Poor quality software gets outed and maybe fixed.

by 6d7770

7/4/2026 at 12:02:49 AM

Good

by comradesmith

7/4/2026 at 1:48:45 AM

[dead]

by black_13

7/4/2026 at 4:51:05 AM

So, another victory for the LLM. We were told by project maintainers that AI generated pull requests for vulnerabilities would be blocked. Looks like humans take another L. We have to get out of the way.

by general_reveal

7/4/2026 at 2:19:55 AM

Is this because LLMs are better at finding vulnerabilities or because increased use of LLMs for coding is creating more vulnerabilities?

by IAmGraydon

7/4/2026 at 8:15:04 AM

It's the former.

by AussieWog93

7/4/2026 at 11:11:26 AM

It's definitely both. Half the code my team puts into PR these days is dogshit.

by yugoslavia4ever

7/4/2026 at 1:34:04 PM

Nah it's overwhelmingly the former, as far as what project Glasswing has focused on. It's finding vulnerabilities in code that was written years (in some cases decades) ago. Browsers, Linux Kernel, etc.

That's not to say that we aren't introducing new bugs, but I'm only addressing Mythos and Glasswing.

by atonse