3/27/2026 at 5:38:49 PM
Hey folks, I'm Alex from the reliability engineering team at Anthropic. We've just posted the retrospective for this incident:> On March 26–27, 2026, customers experienced elevated error rates when using Claude Opus 4.6 and Claude Sonnet 4.6. The issue was caused by a networking performance degradation within our cloud infrastructure that disrupted communication between components of our serving stack. We resolved the incident by migrating the affected workloads to healthy infrastructure, restoring normal service by 9:30 AM PT on March 27.
by palcu
3/27/2026 at 6:33:53 PM
Is it really an answer to say "network disruption" with a bunch of $10 words? Certainly it doesn't belong here of all places.by halJordan
3/28/2026 at 4:25:55 AM
It’s definitely an answer! Maybe just not a “retrospective”?by nerdsniper
3/27/2026 at 11:53:34 PM
Are you able to share if there's a general trend behind the outages? Do you often hit capacity, or do you budget to have headroom?by cedws
3/28/2026 at 9:04:37 AM
Yes, the general trend is the unprecedented growth that we've seen. Typically one would have some time in advance to re-engineer the systems to support the increased in traffic and users. But we're dealing with very compressed timelines and while most of the time we're able to fix the issues beforehand, sometimes we have to do them in production. Sorry for that.by palcu