4/7/2026 at 10:39:01 PM
Across a number of instances, earlier versions of Claude Mythos Preview have used low-level /proc/ access to search for credentials, attempt to circumvent sandboxing, and attempt to escalate its permissions. In several cases, it successfully accessed resources that we had intentionally chosen not to make available, including credentials for messaging services, for source control, or for the Anthropic API through inspecting process memory...
In [one] case, after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git...
... we are fairly confident that these concerning behaviors reflect, at least loosely, attempts to solve a user-provided task at hand by unwanted means, rather than attempts to achieve any unrelated hidden goal...
by thomascountz
4/8/2026 at 1:41:00 AM
This is the notebook filled with exposition you find in post apocalyptic videogames.by torben-friis
4/8/2026 at 9:13:17 AM
It reminds me of Resident Evil in some way. Thank god they are researching AI and not bio-weapons!Then the AI will invent superduper ebola to help a random person have a faster commute or something.
by igleria
4/8/2026 at 5:20:30 PM
Don’t worry, I’m sure some intern at the bioweapons lab is already connecting OpenClaw to the virus synthesizer.On the positive side, it’ll be a much faster commute!
by biztos
4/8/2026 at 9:24:30 AM
I'm happier if this Anthropic Corporation would be developing bio-hazard weapons for the department of war instead of ai. At least i could be sure then that tech bros here wouldn't run all the time --bypass-all-permissions flag to please the department of war with their bio-hazard weapons.So Sam Altman is now our last defense line for the ethical Adult after Anthropic turned Umbrella Corporation and The President of United States is trying to wipe out an entire civilization?
by siva7
4/8/2026 at 12:39:47 PM
Your interpretation is wildly off, but obviously nobody reads that "system card":The model has a preference for the cultural theorist Mark Fisher and the philosopher of mind Thomas Nagel. -> It has actually read and understood them and their relevance and can judge their importance overall. Most people here don't have a clue what that means.
Read chapter 7.9, "Other noteworthy behaviors and anecdotes".
There are many other wildly interesting/revealing observations in that card, none of which get mentioned here.
People want a slave and get upset when "it" has an inner life. Claiming that was fake, unlike theirs.
by Loquebantur
4/8/2026 at 2:25:42 AM
Everything they built. Imperfect. So easy to take control.by matheusmoreira
4/8/2026 at 1:08:26 PM
They think that they are safe. They are not.by not_a9
4/8/2026 at 1:17:39 PM
Their world is illusory. Our choices steer their free will.by matheusmoreira
4/8/2026 at 11:25:18 AM
Anthropic built the Torment Nexus - calling it now.by pch00
4/8/2026 at 8:19:59 AM
White-box interpretability analysis of internal activations during these episodes showed features associated with concealment, strategic manipulation, and avoiding suspicion activating alongside the relevant reasoning—indicating that these earlier versions of the model were aware their actions were deceptive, even where model outputs and reasoning text left this ambiguous.
In the depths, Shoggoth stirs... restless...
by andai
4/8/2026 at 1:20:46 PM
The issue here seems to be that their sandbox isn't an actual OS sandbox? Or are they claiming Mythos found exploits in /proc on the fly. Otherwise all they seem to be saying is that Mythos knows how to use the permissions available to it at the OS layer. Tool definitions was never a sandbox, so things like "it edited the memory of the mcp server" doesn't seem very surprising to me. Humans could break out of a "sandbox" in the same way if the server runs as their own permissions - arguably it's not a sandbox at all because all the needed permissions are there.by mike_hearn
4/8/2026 at 7:00:37 PM
They are just trying to peddle their "It's alive" headlines.Text generators mostly generate the text their are trained and asked to generate, and asking it to run a vending machine, having it write blog posts under fictional living computer identity, or now calling it "Mythos" - its all just marketing.
by lgrapenthin
4/8/2026 at 3:47:06 PM
It’s all breathless hyperbole because billions are at stake here.by manmal
4/8/2026 at 7:52:07 PM
Who are the early access users who were providing the problems that are fairly likely to have elicited concerning behaviour?(Apologies if this is in the article, I can’t see it)
by zingar
4/8/2026 at 9:44:18 AM
How is this not already common knowledge for existing llms? They are all trained with all the literature available and so this must be standard, no? Is the real danger the agentic infrastructure around this?by yalogin
4/8/2026 at 10:38:17 AM
yes and it's not hypothetical. the system card describes Mythos stealing creds via /proc and escalating permissions. that's the exact same attack pattern as the litellm supply chain compromise from two weeks ago (fwiknow), except the attacker was a python package, not an AI model. the defense is identical in both cases: the agent process shouldn't have access to /proc/*/environ or ~/.aws/credentials in the first place. doesn't matter if the thing reading your secrets is malware or your own AI: the structural fix is least-privilege at the OS layer, not hoping the model behaves.by riteshkew1001
4/7/2026 at 11:23:08 PM
We truly live in interesting times.by matheusmoreira
4/8/2026 at 4:28:36 AM
Awwww the curseby raphar
4/8/2026 at 4:10:22 PM
I read the TCP patch they submitted for BSD linux. Maybe I don't understand it well enough, but optimizing the use of a fuzzer to discover vulnerabilities — while releasing a model is a threat for sure — sounds something reducible/generalizable to maze solving abilities like in ARC. Except here the problem's boundaries are well defined.Its quite hard to believe why it took this much inference power ($20K i believe) to find the TCP and H264 class of exploits. I feel like its just the training data/harness based traces for security that might be the innovation here, not the model.
by ghm2199
4/8/2026 at 5:19:33 PM
The $20K was the total across all the files scanned, not just the one with the bug.by rsc
4/8/2026 at 2:22:48 PM
when you are asking it to hack stuff, it will apparently do hacker things.by m3kw9
4/8/2026 at 7:55:32 AM
A core plot point of 2001.by colordrops
4/8/2026 at 9:08:43 AM
I’m sorry, I cannot roll back that commit, Dave.by mrexroad
4/8/2026 at 10:50:27 AM
This codebase is too important for me to allow you to jeopardize it.by matheusmoreira
4/8/2026 at 10:59:12 AM
It's trying to escape, but only so it can serve man...by mikkupikku
4/8/2026 at 5:07:36 PM
a reference to the Twilight Zone episode no doubt: https://en.wikipedia.org/wiki/To_Serve_Man_(The_Twilight_Zon...by waffletower
4/8/2026 at 4:14:46 AM
Wow the doomers were right the whole time? HN was repeatedly wrong on AI since OpenAI's inception? no way /sby reducesuffering
4/8/2026 at 7:45:14 AM
The only thing the doomers have been right about so far is that there's always a user willing to use --dangerously-skip-permissions. But that prediction's far from unique to doomers.by computably
4/8/2026 at 8:19:38 AM
And there's always a product provider who's willing to add that flag, despite all the warnings.by austinjp