Claude loses its >99% uptime in Q1 2026

3/27/2026 at 5:38:49 PM

Hey folks, I'm Alex from the reliability engineering team at Anthropic. We've just posted the retrospective for this incident:

> On March 26–27, 2026, customers experienced elevated error rates when using Claude Opus 4.6 and Claude Sonnet 4.6. The issue was caused by a networking performance degradation within our cloud infrastructure that disrupted communication between components of our serving stack. We resolved the incident by migrating the affected workloads to healthy infrastructure, restoring normal service by 9:30 AM PT on March 27.

https://status.claude.com/incidents/b9802k1zb5l2

by palcu

3/27/2026 at 6:33:53 PM

Is it really an answer to say "network disruption" with a bunch of $10 words? Certainly it doesn't belong here of all places.

by halJordan

3/28/2026 at 4:25:55 AM

It’s definitely an answer! Maybe just not a “retrospective”?

by nerdsniper

3/27/2026 at 11:53:34 PM

Are you able to share if there's a general trend behind the outages? Do you often hit capacity, or do you budget to have headroom?

by cedws

3/28/2026 at 9:04:37 AM

Yes, the general trend is the unprecedented growth that we've seen. Typically one would have some time in advance to re-engineer the systems to support the increased in traffic and users. But we're dealing with very compressed timelines and while most of the time we're able to fix the issues beforehand, sometimes we have to do them in production. Sorry for that.

by palcu

3/27/2026 at 3:24:03 PM

At this point you can stop worrying about downtime-free deployments so the devops becomes easier

by yread

3/27/2026 at 3:17:20 PM

> Our uptime has a '9' in it! -- Anthropic

by michaelcampbell

3/27/2026 at 3:51:54 PM

Github this month is very close to having 0 9s reliability. (unless they want to argue that 89% has a 9 in it)

by adgjlsfhk1

3/27/2026 at 4:10:33 PM

I'm not sure I've had a day without Github hiccups this month, so that feels right.

by littlestymaar

3/27/2026 at 3:56:14 PM

The comment you are replying is carefully written in a way that allows 23.19%

by marcosdumay

3/27/2026 at 9:43:11 PM

There is always 88.9% or 88.89%

by claw-el

3/27/2026 at 3:54:13 PM

By now, I'm nearly certain that they'd be down to 0 9s of uptime if they counted it conservatively.

by ACCount37

3/27/2026 at 3:41:50 PM

Or as the British would say "9 innit ?"

by leosanchez

3/27/2026 at 3:19:26 PM

We had a ton of traffic coming in to check them: https://downforeveryoneorjustme.com/anthropic

Not one of the usual ones that has service problems :)

by bwb

3/27/2026 at 2:40:06 PM

https://status.claude.com/

by timpera

3/27/2026 at 3:13:44 PM

You can access Claude models with Google Cloud reliability via VertexAI. The caveat is that you cannot use your subscription, per-token pricing only.

I personally prefer per-token, it makes you more thoughtful about your setup and usage, instead of spray and pray.

You can also access the notable open weight models with VertexAI, only need to change the model id string.

by verdverm

3/27/2026 at 3:26:55 PM

I also use them per-token (and strongly prefer that due to a lack of lock-in).

However, from a game theory perspective, when there's a subscription, the model makers are incentivized to maximize problem solving in the minimum amount of tokens. With per-token pricing, the incentive is to maximize problem solving while increasing token usage.

by Scene_Cast2

3/27/2026 at 3:32:10 PM

I don't think this is quite right because it's the same model underneath. This problem can manifest more through the tooling on top, but still largely hard to separate without people catching you.

I do agree that Big Ai has misaligned incentives with users, generally speaking. This is why I per-token with a custom agent stack.

I suspect the game theoretic aspects come into play more with the quantizing. I have not (anecdotally) experienced this in my API based, per-token usage. I.e. I'm getting what I pay for.

by verdverm

3/27/2026 at 6:10:51 PM

We tried this, but the quota for Opus models defaults to 0 on VertexAI and quota increase requests are auto-rejected.

Any tips?

by lima

3/29/2026 at 12:58:00 AM

What? There's no quota at all. You pay per token up to infinity.

by polski-g

3/29/2026 at 3:50:00 AM

There are in fact quotas and rate limits in VertexAI, albeit generous and automatically increased based on spend

by verdverm

3/27/2026 at 3:42:25 PM

You can use your subscription for Anthropic-hosted Claude models?

by perfmode

3/27/2026 at 3:43:45 PM

Don't know. I tried Anthropic directly a long time ago and was frustrated by their uptime issues. Seems it has not improved in the years since.

by verdverm

3/27/2026 at 6:09:04 PM

No, unless you count tricks which are explicitly against ToS

by lima

3/27/2026 at 3:18:01 PM

I saw a funny skit where if free Claude instance was down for you, you could just ask Rufus, Amazon's shopping AI assistant, your math/coding question phrased as a question about a product, and it would just answer lol.

by joe_mamba

3/27/2026 at 3:35:27 PM

In my region a certain small bank had an AI assistant which someone neglected to limit, so you could put whatever there and not even phrase it as a question about a product.

by Tade0

3/27/2026 at 3:35:36 PM

You mean Google Chaos Services as we call them?

by chewbacha

3/27/2026 at 3:15:08 PM

Remember when putting your entire life & business into the cloud was good because they were all offering 5 9s of uptime?

Very few cases these days.. feels like we are lucky to get 2 9s anymore.

by steveBK123

3/27/2026 at 3:20:09 PM

Honestly, downtime has gotten way better as one of the people behind (https://downforeveryoneorjustme.com). Compared to 10 years ago things are so much more redundant and harder to take down.

by bwb

3/27/2026 at 6:51:35 PM

Thanks for the data-based comment!

Have you noticed any change in that trend in the past year or two, or is it continuing to get better?

by Fishkins

3/28/2026 at 10:42:02 AM

Np, 2 years is harder for me to tell. We need to get more of that data public and organized, and are looking at how we can do that...

We are working on some big improvements to the backend and should have some cool stuff to share later this year :)

by bwb

3/27/2026 at 3:35:29 PM

Thank you finally.

Tired of all the people online with anxiety who project their own personal issues by spamming this kind of doomer posts.

by ieie3366

3/27/2026 at 3:52:21 PM

So then why does no one offer 99.999% uptime guarantees in writing?

It should be low risk to offer such guarantees then.

by MichaelZuo

3/27/2026 at 3:56:28 PM

Well, (a) why would they? (b) "uptime" has shifted from a binary "site up/down" to "degraded performance", which itself indicates improvements to uptime since we're both pickier and more precise.

by staticassertion

3/27/2026 at 4:14:20 PM

Are we really questioning why cloud providers would offer better uptime guarantees?

by Alifatisk

3/27/2026 at 4:39:42 PM

Yes, I'm asking why they'd lock themselves into a contract around 5 9s of uptime since the parent poster mentioned that they won't do so. Of course, AWS actually does do this in some cases and they guarantee 99.99% for most things, so it feels a bit arbitrary - 5 minutes vs an hour, roughly.

by staticassertion

3/28/2026 at 2:12:02 PM

So then its clearly not as trivial to achieve as you made it sound.

by MichaelZuo

3/28/2026 at 4:40:09 PM

Are you replying to the right person?

by staticassertion

3/27/2026 at 5:25:09 PM

You can certainly sign a contract for five nines SLA with cloud providers.

You just won't like the price.

by groby_b

3/28/2026 at 1:07:41 AM

Then it’s clearly higher risk?

by MichaelZuo

3/27/2026 at 9:59:45 PM

If you are asking this question you don't understand what it takes to hit 5 nines in a real life measured system.

by Anon1096

3/27/2026 at 5:30:41 PM

https://en.wikipedia.org/wiki/High_availability#Percentage_c...

by KellyCriterion

3/27/2026 at 3:58:28 PM

'The outage of a single server is a tragedy, the outage of an entire AWS region is a statistic.'

- Stalin probably

by torginus

3/28/2026 at 1:21:09 PM

I do think it’s a choice by many CTOs to fail conventionally & collectively by delegating outage responsibility to AWS

by steveBK123

3/27/2026 at 3:41:29 PM

I wonder how much is due to supply constraints, how much is standard growing pains, and if over-reliance on AI was the cause for any outages.

by dehrmann

3/27/2026 at 6:07:23 PM

I know they tend to get much slower early evenings in the Western US... Not sure if this is everyone on the west coast going home and working on stuff, or the early people in the Asia region coming online.

by tracker1

3/27/2026 at 4:44:10 PM

Maybe they are gunning for 5 nines (9.9999%)

by yomismoaqui

3/27/2026 at 9:50:08 PM

It's pretty damn good, and it's seen a real exodus of conscientious users; the QuitGPT movement alone hit 1.5 million participants, with Claude skyrocketing to #1 on the App Store virtually overnight. No surprise the servers are getting hammered.

time to give your devops guy his job back.

by rambojohnson

3/27/2026 at 9:39:08 PM

The ironic thing about outages such as this one and Github's recent spate of outages are that if those vendors' sales pitches are to be believed, the vendors could just ask their LLMs to program reliable replacements overnight (okay, maybe a weekend).

by sgbeal

3/28/2026 at 5:00:40 AM

So tired of seeing this same comment in every thread.

by solumunus

3/28/2026 at 12:07:03 PM

> So tired of seeing this same comment in every thread.

So tired of seeing vendors not eat their own dog food and then try to sell it as tenderloin steaks.

by sgbeal

3/27/2026 at 3:02:04 PM

I honestly feel like it's more honest status measure than many status pages I know.

by Trufa

3/27/2026 at 3:56:09 PM

Probably vide-coded their infrastructure

by scuff3d

3/27/2026 at 3:07:44 PM

They seem to be a victim of their own success. Their response times are quite bad, and it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources. They just announced that they're cutting their usage limits down during peak hours as well.

They're in serious risk of losing their lead with this sort of performance.

by seneca

3/27/2026 at 4:03:27 PM

> it's widely believed they are doing something to degrade service quality (quantizing?) in order to stretch resources

God, I wish this inane bullshit would just fucking die already.

Models are not "degrading". They're not being "secretly quantized". And no one is swapping out your 1.2T frontier behemoth for a cheap 120B toy and hoping you wouldn't notice!

It's just that humans are completely full of shit, and can't be trusted to measure LLM performance objectively!

Every time you use an LLM, you learn its capability profile better. You start using it more aggressively at what it's "good" at, until you find the limits and expose the flaws. You start paying attention to the more subtle issues you overlooked at first. Your honeymoon period wears off and you see that "the model got dumber". It didn't. You got better at pushing it to its limits, exposing the ways in which it was always dumb.

Now, will the likes of Anthropic just "API error: overloaded" you on any day of the week that ends in Y? Will they reduce your usage quotas and hope that you don't notice because they never gave you a number anyway? Oh, definitely. But that "they're making the models WORSE" bullshit lives in people's heads way more than in any reality.

by ACCount37

3/27/2026 at 8:43:35 PM

It's possible though - it was a bug, a model pool instance wasn't updated properly and served a very old model for several months; whoever hit this instance would received a response from a prev version of a model.

by BoneShard

3/27/2026 at 11:36:14 PM

While it's true that people are naturally predisposed to invent the "secret quantizing" conspiracy regardless of whether the actual conspiracy exists or not, I think there's more to the story.

I've seen Sonnet consistently start hallucinating on the exact same inputs for a couple hours, and then just go back to normal like nothing ever happened. It may just be a combination of hardware malfunction + session pinning. But at the end of the day the effects are indistinguishable from "secret quantizing".

by hbrn

3/27/2026 at 3:24:53 PM

>"They're in serious risk of losing their lead with this sort of performance."

Nobody goes there anymore, it's too crowded.

by ramesh31

3/27/2026 at 4:59:59 PM

You'll notice I specifically said "victims of their own success". Obviously these problems are induced by the fact that they have so many users. Blowing a lead due to inability to handle the demands of success is still a path to losing the lead.

by seneca

3/27/2026 at 3:13:21 PM

It can't be worse than gemini-cli using a Pro account.

by sva_

3/27/2026 at 3:23:25 PM

Oh really? Do they have availability problems too?

by seneca

3/27/2026 at 3:26:44 PM

Gemini CLI has been broken for the past 2-3 days, with no response from Google. Really embarrassing for a multi-trillion dollar company. At this point Codex is the only reliable CLI app, out of the big three.

https://www.reddit.com/r/GeminiCLI/comments/1s49pag/this_is_...

by nsingh2

3/27/2026 at 6:39:49 PM

Last time I tried it a single prompt ran for over an hour, mostly doing nothing/waiting on availability.

by sva_

3/27/2026 at 3:15:55 PM

I can't speak on Gemini but OpenAI is far worse for free accounts at least

by internetter

3/27/2026 at 3:31:11 PM

GeminiCLI is absolutely terrible, nothing comparable to the browser access. I've started using the 'AI Pro' tier lately and I get 15 minutes response times from Gemini 3 'Flash' on a regular basis.

by danelski

3/27/2026 at 3:12:38 PM

  > this sort of performance

They've been very proud of it.

by orphea

3/27/2026 at 3:46:13 PM

i just use gemini 3 flash via api with custom agent.

only people who do not even look at code anymore need anything more than that.

by faangguyindia

3/27/2026 at 3:15:55 PM

[dead]

by no_shadowban_3

3/27/2026 at 4:03:23 PM

I wouldn't be too harsh, scaling x10 YoY is a bit hard on the infra!

by aubanel

3/27/2026 at 4:17:54 PM

OpenAI managed it way better, but we might have Microsoft to thank for that.

by timpera

3/27/2026 at 5:41:51 PM

But isn't GitHub's perpetual demise Microsoft's fault?

by gherkinnn

3/27/2026 at 8:28:24 PM

We don't know any numbers.

by BoredPositron

3/27/2026 at 6:13:15 PM

isn't serving Claude embarrassingly parallel tho?

by whateveracct

3/27/2026 at 4:07:01 PM

If you don't pay attention 99% may sound high but it means up to 20 hours of downtime in over the quarter.

Anthropic has had more than that.

Yikes.

by littlestymaar

3/27/2026 at 3:49:04 PM

MAKE NO MISTAKES! DO NOT HALLUCINATE! FIX IT!

by claudiug

3/27/2026 at 3:56:43 PM

I find it's more reliable if you write "you are a highly experienced software engineer".

by maplethorpe

3/29/2026 at 1:00:15 AM

Pisses me off when I see those slop Skill repos posted every week.

by polski-g

3/27/2026 at 7:08:11 PM

I start every prompt with "we have been going in circles". It is the shibboleth for anthropic to A/B test you with their secret new model.

by nurettin

3/27/2026 at 3:57:48 PM

[dead]

by yubainu

3/27/2026 at 3:33:15 PM

Victim of success.

They are the best.

ChatGPT is walmart.

Gemini is kroger.

Claude is... idk your local grocer that is always amazing and costs more?

by 3yr-i-frew-up

3/27/2026 at 3:42:10 PM

The local grocer that isn't amazing and cost more and actually isn't really that local in the sense that none of the products sold are from local businesses/producers?

by quentindanjou

3/28/2026 at 8:13:26 PM

I'd say claude is whole foods

by Bolwin

3/27/2026 at 4:26:03 PM

No bud, Opus is the best model at this current moment.

GPT4.5 + COT would have been the best, but OpenAI got cheap.

by 3yr-i-frew-up

3/27/2026 at 3:06:19 PM

This is not an outage, Claude just gets lazier on Fridays.

Sometimes Claude wants more lunch breaks, takes a half day and leaves the desk early just like any human would. (since AI boosters like comparing LLMs to humans all the time) /s

by rvz

3/27/2026 at 3:15:51 PM

If you're concerned about humans anthropomorphizing AI models, you might want to steer well clear of Anthropic, as their entire positioning (starting with the product name and continuing with UX choices and model releases) is built to attract the kind of researchers who are prone to believe in sentient machines.

They are going in the "Claude is alive" direction already and that line of communication is likely going full throttle in the nearby future.

by sebastiennight

3/27/2026 at 9:40:07 PM

I suspect the next big marketing gimmick is this supposed leak about capybara. I suspect the leak is intentional and meant to influence their expected IPO.

I think the big reveal is going to be that frontier models are no better than the open source models that you could feasibly run on retail hardware however they have a highly complex harness behind the API where the magic is.

by GorbachevyChase

3/28/2026 at 7:31:17 PM

I think we're talking about two very different things. I don't think that Anthropic's anthropomorphizing is a marketing gimmick. It would be less concerning if it was.

by sebastiennight

3/27/2026 at 9:08:48 PM

I had my agent set up a "team" of subagents directed to different parts of a big new app (UX Engineer, test lead, etc) . Apparently the Senior SWE had reduced the scope, and my PM came to me trying to argue the side of the SWE that had reduced the scope for time constraint reasons...

It went a bit too deep into the role-playing bit.

by scottyah

3/27/2026 at 3:08:44 PM

You joke, but I think that's a fair summary of why people don't mind one 9 of uptime in a key component of their development workflow.

by SpicyLemonZest

3/27/2026 at 5:25:06 PM

[dead]

by boxingdog

3/27/2026 at 2:45:19 PM

[dead]

by mastabadtomm