4/12/2026 at 3:02:18 PM
Hey all, Boris from the Claude Code team here.We've been investigating these reports, and a few of the top issues we've found are:
1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.
2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.
In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.
We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.
by bcherny
4/13/2026 at 9:33:35 AM
So Anthropic is trying to save money on infrastructure, we all get it. However, it's not ok to degrade the performance your users have paid for. Last week the issue was that you reduced the default "effort" level, now the prompt cache is shortened. Several users experience far more restrictive usage limits lately.There is only so much you can do through "UX improvements" or some smart routing on the backend. Your flagship product is actively getting worse, and if users need to fiddle with hidden settings and keep track of GitHub issues every week they will start voting with their money.
by pu_pe
4/12/2026 at 3:36:12 PM
Boris, you're seeing a ton of anecdotes here and Claude has done something that has affected a bunch of their most fervent users.Jeff Bezos famously said that if the anecdotes are contradicting the metrics, then the metrics are measuring the wrong things. I suggest you take the anecdotes here seriously and figure out where/why the metrics are wrong.
by reenorap
4/12/2026 at 3:52:34 PM
On the subject of metrics, better user-facing metrics to understand and debug usage patterns would be a great addition. I'd love an easier way to understand the ave cost incurred by a specific skill, for example. (If I'm missing something obvious, let me know.)Baking deeper analytics into CC would be helpful... similar to ccusage perhaps: https://github.com/ryoppippi/ccusage
by toddmorey
4/13/2026 at 4:18:23 AM
This is useful if you want to keep an eye on what claude's actually doing behind the scenes: https://github.com/simple10/agents-observeby simple10
4/13/2026 at 2:20:02 AM
[dead]by shamcleren
4/12/2026 at 3:38:09 PM
We are taking it seriously, and are continuing to investigate. We are not trusting the metrics.by bcherny
4/12/2026 at 6:23:32 PM
The quantitative ux research team at Google was created for exactly this problem: a service which became popular before the right metrics existed, meaning metrics need to be derived first, then optimized. We would observe users (irl), read their logs, then generate experiments to improve the behavior as measured by logs, and return to see if the experiment improves irl experiences. There were not many of us and we are around :)by stevenae
4/12/2026 at 9:27:57 PM
I worked with Boris in the past and in my experience, Boris cares deeply about the customer. I'd vouch that Boris really cares about the issue people are running into.by ajma
4/13/2026 at 3:30:26 AM
“ Hello. My name is Mr. Sirob,”https://amphetamem.es/meme/?id=the-simpsons_04_12_89×ta...
by thejazzman
4/13/2026 at 7:24:00 AM
Yet he vibe slops the code that the customer has to use.by dkersten
4/13/2026 at 9:16:54 AM
Anthropic can't win in this case.They don't use Claude Code, they get accused that they don't even trust it themselves.
They use Claude Code, they get accused the code is shit because it's slop.
I think dogfooding is known to be a legitimate approach here.
by stingraycharles
4/13/2026 at 10:17:18 AM
The idea is that Claude Code is surprisingly buggy and unrefined for something created by the very tool and processes that are supposed to be replacing us as we speak.by Toutouxc
4/13/2026 at 10:09:44 AM
> Anthropic can't win in this case.Sure they can. The solution is pretty simple and in your own post. Choose either:
* Make the product good to the point code is no longer slop and shit.
* Stop hyping the quality when it isn’t there.
* Do a hybrid approach. Use their own product but actually have competent humans in the loop to make the code good.
This is not hard. Be honest and humble and that criticism goes away. It’s no one’s fault but Anthropic’s that they hype up their product to more than it can do and use it carelessly to build itself. It’s not a no-win scenario if you’re the one avoidably causing your own problems.
by latexr
4/13/2026 at 12:17:35 AM
Google products ux is widely acknowledged to be a steaming pile of shit though, so I am not sure you should follow their example.Many of the metrics they use are obviously actively user hostile.
by Traubenfuchs
4/12/2026 at 4:10:53 PM
Thank youby reenorap
4/12/2026 at 7:06:12 PM
Hopefully yourself, and not via your ai tools.by blks
4/12/2026 at 3:41:49 PM
Cool, are you going to be transparent and explain the metrics and costs as a postmortem? And given the inability to actually audit what you produce, why should we trust Anthropic?by Ucalegon
4/12/2026 at 6:33:38 PM
HN sometimes talks about pathological customers who will never be happy. Boris is probably the single best rep in the community, possibly ever.The way your tone and complaints come across reminds me of this. As a paying customer ($5k spend per month in my corporate job), I’d rather anthropic keep doing what they’re doing — innovating and shipping useful stuff at blinding speed — and not index on your feedback. I think the tradeoffs they would cost far outweigh the consequences.
by edmundsauto
4/13/2026 at 10:21:41 AM
> Boris is probably the single best rep in the community, possibly ever.When you say “the community”, what exactly are you referring to?
by latexr
4/12/2026 at 3:51:54 PM
Dang man, chill.by nickandbro
4/12/2026 at 3:55:05 PM
Man, expecting the minimal from companies who are supposed to deliver a pro... there is no SLA for any this, so you are right.Also, why is there no SLA?
by Ucalegon
4/13/2026 at 8:46:42 AM
You’re not getting a worthwhile sla on a subscription at this rate. What are you going to get? A few dollars? An sla isn’t useful unless it actually bites for the provider and actually compensates the customer. And it costs money - how much are you willing to spend for this insurance?by IanCal
4/12/2026 at 4:49:04 PM
because there isn't one and people still paid for it.My clients demand one, so there is one.
by 946789987649
4/12/2026 at 4:56:27 PM
Imagine if people were like your clients.by Ucalegon
4/12/2026 at 8:49:29 PM
If they were, they wouldn't buy your product without an SLA. But they're not.by otterley
4/12/2026 at 4:30:33 PM
Because this is ultimately a beta service. The whole industry is.by alpha_squared
4/12/2026 at 4:39:44 PM
Wait, where is there a 'beta' tag to something that they are charging real money for? Why is this software any different than any other software and we should completely give away our rights as a consumer to ensure what we pay for is delivered?by Ucalegon
4/12/2026 at 4:59:48 PM
I think the parent is saying that one should be aware that the whole LLM industry is still in an experimental stage and far from mature. What you want isn’t what’s being offered. I agree that there should be higher standards, but what we currently have is an arms race. The consequence is to factor that into the value proposition and maybe not rely too much on it.by layer8
4/12/2026 at 5:08:36 PM
SLAs should be standard for any paid service, especially on the enterprise side, but also on the consumer side. Being immature as a company does not excuse a lack of service delivery.by Ucalegon
4/12/2026 at 7:23:53 PM
Not every customer, even a paying customer, demands reliability at a particular level. Market segmentation tends to address those situations: pay more, get more.by otterley
4/12/2026 at 8:48:01 PM
'I don't want to hold companies to account for failing to deliver services, therefore I think everyone else should live by my permissive "standards".'by Ucalegon
4/12/2026 at 8:50:24 PM
They can be held to account when they fail to deliver what they promise! But what is promised for delivery is what's in the Terms of Service (i.e. the agreement). Nothing more. If it's not in there, you can't hold them to account for it.by otterley
4/13/2026 at 3:35:30 AM
Yes, that's the problem.It's too easy for companies to fail to provide their service as long as they never promise to provide their service.
by Dylan16807
4/13/2026 at 3:44:21 AM
> It's too easy for companies to fail to provide their service as long as they never promise to provide their service.I don't even know what this means. You can't make anyone work for free, nor dictate the terms of what kind of work someone will do without their consent. I assume you are not pro-slavery.
by otterley
4/13/2026 at 3:45:50 AM
I'll make a very simple example.The service at mcdonald's is providing food for money.
When their ice cream machine is broken, they fail to provide part of their service.
I'm not saying anything about "making" them do anything. I'm just calling out their failure and saying it's a bad thing.
by Dylan16807
4/13/2026 at 3:47:20 AM
You didn't merely call out their failure. You said it was "too easy," implying something more, like they owe you something. It's a pretty entitled point of view.by otterley
4/13/2026 at 3:57:39 AM
I don't think it's "entitled" to want companies to put some effort into avoiding those failures.If the government did something, we could think of it as similar to passing inspection.
The other way to look at things is that the market isn't varied and competitive enough to punish the companies that fail this way.
They don't have to "owe me" anything for me to desire a different balance. My desire is fine.
by Dylan16807
4/13/2026 at 4:01:33 AM
"[W]ant[ing] companies to put some effort into avoiding ... failures" is not the same as "hold[ing] them to account". The former is "this sucks and I don't like it." The latter is "punish them or force them to do what I want!"--i.e., some sort of legal remedy.by otterley
4/13/2026 at 9:29:38 AM
If you can point to a consumer targeted service that provides and keeps their SLAs, I’ll be impressed.by phs318u
4/12/2026 at 4:51:46 PM
What right as a consumer do you have that is pertinent here, other than to have the vendor adhere to the terms of the agreement you have with them?Anthropic has many customers despite the fact that they have occasional problems. They’re not suing Anthropic because Anthropic isn’t promising in its agreement something they can’t deliver.
I think you’re reading into the agreement something that isn’t there, and that’s the cause of your confusion.
by otterley
4/12/2026 at 5:04:30 PM
I am not reading into an agreement, I am saying there is no agreement to be found to ensure service delivery and the associated liability that would come for any SLA. Also, where is the Anthorpic SLA for Enterprise?Does it exist?
Just because people pay for things doesn't mean they know or understand what they are paying for. Nor is there the legal precedence to actually understand where the rub lies or how that impacts business.
by Ucalegon
4/12/2026 at 7:20:45 PM
> Just because people pay for things doesn't mean they know or understand what they are paying for.I believe, respectfully, that’s precisely what is happening in this thread because you keep complaining about the absence of an SLA that was never in the agreement, as though it is—or is supposed to be—there, and therefore the existence of some “rights” that would flow from that.
by otterley
4/12/2026 at 8:48:28 PM
There are no SLAs, in any agreement, thats the problem.by Ucalegon
4/12/2026 at 8:52:03 PM
We're back to square one: https://news.ycombinator.com/item?id=47741877by otterley
4/12/2026 at 4:39:22 PM
It's incredible that Boris is here on HN being open and sharing an issue they don't fully understand yet, and offering a possible workaround. CTFO.Thank you Boris.
by mrcwinn
4/12/2026 at 5:29:12 PM
I am sorry you feel this way, but the reality of the situation is there is zero reason to trust anything Anthropic or Boris says. They have no legal liability or obligation to tell the truth, besides brand risk, which to people like you is mitigated for a single person to show up, post, and thats it.by Ucalegon
4/13/2026 at 6:01:00 AM
You should work at these companies and understand they have good intentioned employees otherwise they’d rarely pass the cultural interviews plus background checks plus backchanneling. Have a bit more faith in the employeesby mliker
4/12/2026 at 8:49:05 PM
What truth do you believe you are not being told, exactly?by otterley
4/13/2026 at 3:22:15 AM
lol it is _way_ too easy for people to talk like this behind a computer screen.by trueno
4/12/2026 at 3:49:12 PM
Dude is on hacker news on a Sunday. half the GDP of the world is competing with him. What metrics would you like to see?by amirhirsch
4/12/2026 at 3:53:28 PM
An enforceable SLA with the services that Anthropic offers rather than putting an employee to respond to things on Sunday.by Ucalegon
4/12/2026 at 4:13:26 PM
>> rather than putting an employee to respond to things on Sunday.Maybe just maybe they didn’t put him here, rather he just a normal guy who reads HN, who is passionate about his role, and is here on his own time.
by roamerz
4/12/2026 at 4:22:15 PM
Maybe... maybe... maybe... none of this builds trust when there is something that does build trust; putting revenue on the line and opening yourself to legal liability. Otherwise everything is empty and meaningless, its just PR, and nothing more.by Ucalegon
4/13/2026 at 3:37:15 AM
You can get a SLA and ZDR by choosing one of the Claude partners (eg on Bedrock)https://platform.claude.com/docs/en/build-with-claude/api-an...
by nl
4/12/2026 at 4:54:28 PM
Then you should offer to pay them for one. I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price.by otterley
4/12/2026 at 5:00:30 PM
They don't offer a ZDR [0] for files, even if you have a BAA or dealing with HIPAA data, no matter how much you pay them. Trust me, we have tried.by Ucalegon
4/12/2026 at 7:19:14 PM
I’m really confused. We were talking about SLAs, not other product features. Are you moving the goalposts?by otterley
4/12/2026 at 8:46:39 PM
There isn't an SLA nor is there any protections around file uploads to their services. Two, bad, things can be true at the same time.by Ucalegon
4/12/2026 at 8:57:56 PM
Did you talk to them about purchasing an SLA? If so, what did they say?by otterley
4/13/2026 at 2:12:02 AM
I feel like you aren't really understanding what a Service-level Agreement actually is in practice. It's not a piece of paper with a specific number of nines and an associated price tag. They can be and often are very complicated documents that take multiple rounds of redlining to arrive at something both parties agree to.If zero data-retention was non-negotiable for the customer, it's totally possible that the negotiations ended there.
I'm not sure what you're trying to accomplish or unearth beyond what's already been said, which certainly suffices for me.
by xyzzy_plugh
4/13/2026 at 2:22:20 AM
As both an attorney and SRE, I understand what an SLA is. And you can absolutely get an SLA when you buy cloud services from many vendors, including AWS. Some vendors provide it at all price points; others include it at higher service tiers, without complex negotiations needed at all. And, yes, if it’s not on the menu, you may need to negotiate one. But you can’t conclusively say “they don’t offer one” unless you’ve actually gone to the company and asked.https://aws.amazon.com/legal/service-level-agreements/
https://trailhead.salesforce.com/content/learn/modules/slack...
https://support.atlassian.com/subscriptions-and-billing/docs...
Before you casually accuse someone of not knowing what they’re talking about, first make sure you’re on firm ground yourself.
by otterley
4/13/2026 at 3:45:29 AM
It seems like you could save a lot of time and confusion by talking about the SLA that you pay for from Anthropic instead of establishing your bona fides by posting links to various unrelated companies’ SLA pages.Like how was your experience negotiating your SLA with Anthropic? What ballpark are you paying for the SLA with Anthropic that you have in place? How many 9s does your Anthropic SLA cover? Obviously you haven’t posted a half dozen times in this thread about how Anthropic by nature of existing offers SLAs without any knowledge of that, so some simple stuff about your SLA with Anthropic would be helpful.
by jrflowers
4/13/2026 at 3:48:58 AM
I make no unqualified claims as to whether Anthropic offers an SLA. I never did. But I do know that it's unreasonable to claim they don't when you didn't even take the steps to conclusively determine it for yourself.As I said: "I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price."
by otterley
4/13/2026 at 4:22:58 AM
Oh, well in that case, if posting URLs counts as proof of… something, there doesn’t appear to be any SLA page anywhere in their sitemap. https://www.anthropic.com/sitemap.xmlMaybe it is just common for enterprise SaaS businesses to offer SLAs without having a page about it though. Something like that could possibly be unjustifiably burdensome as well because it’s not like they could just type “make a page about how we offer SLAs” and have it magically appear
by jrflowers
4/13/2026 at 4:30:12 AM
Not everything a business might be willing to do is listed on their public website.by otterley
4/13/2026 at 4:39:19 AM
That’s a good point. Having an SLA page is an indicator that a business offers SLAs, not having an SLA page is also an indicator that they offer SLAs, just secretly. If you think about it all of the people constantly complaining about uptime and saying stuff like “I would pay money for an SLA from Anthropic if I could” probably means that they are killing it with all those secret SLAs.I mean obviously they have to offer them, because they exist, as otherwise you’d have to believe something crazy like “they don’t currently offer them” for reasons “that they haven’t disclosed”
by jrflowers
4/13/2026 at 4:51:05 AM
Again, many companies will do things they don’t ordinarily offer for the right price. I’ve seen it happen myself (on both the buyer and seller side) on many occasions.It goes to the extent of the company itself! Very few businesses publicize that they’re for sale or put their company’s purchase price on their website. But acquisitions happen all the time.
Anyway, I don’t appreciate your sarcasm coupled with what seems to be willful ignorance about how the world works, so I won’t be participating in this discussion with you anymore.
by otterley
4/13/2026 at 5:26:01 AM
I don’t get it. If you wanted to convince everybody about a vast universe of secret business and your expertise in it, why would you start with telling people that weren’t able to get an SLA from Anthropic that Anthropic offers SLAs? And then admit that you don’t actually know and then double down?Like if I wanted to convince people that In’N’Out has a secret menu (they do) I wouldn’t start by saying “They have the ingredients to make onion rings, therefore they sell onion rings” (they do not). They offer burgers with lettuce instead of a bun (“protein style”) though. That’s a fact that you can verify by going there or calling them and asking about it. I didn’t rely on my assumptions based on other fast food restaurants, I relied on my knowledge of the topic!
Edit: It seems like bad faith to admit that you’re using “probably” interchangeably with “I don’t know” and then editing in “for a billion dollars” several posts into a conversation.
I guess enjoy posting about entirely unrelated conversations in other threads though. (otterley’s post about my having previously had a short amicable exchange with dang in a different thread was deleted, but I’ll leave this part up. I think digging through people’s post histories to find unrelated grievances is icky, for lack of a better word, and wildly unhelpful for any type of discussion)
Even with the “for a billion dollars” addition, admitting “I don’t know” and “probably” are interchangeable doesn’t really change anything from a logical standpoint. Nobody argued against you not knowing, so I don’t understand the purpose of the repetition.
by jrflowers
4/13/2026 at 6:01:30 AM
> why would you start with telling people that weren’t able to get an SLAThat hasn’t been established. There’s no evidence that they went to Anthropic and tried to negotiate one.
> that Anthropic offers SLAs
I didn’t. I said “they probably will for the right price.” There are two modifiers in that statement. And the price is unspecified. Their first offer could be a billion dollars. Too expensive? Negotiate down.
by otterley
4/13/2026 at 6:14:42 AM
I would invite you to notice your interlocutor's assumptions, especially as revealed in his prior comment. Look at how he misunderstands the situation:> If you wanted to convince everybody about a vast universe of secret business and your expertise in it...
> Like if I wanted to convince people that In’N’Out has a secret menu...
You are discussing business. He is understanding you to be attempting to "mog" him, because he cannot adopt a perspective wherein the conversation represents anything other than a vacuous social challenge or "brodown."
In short, you're wasting your time.
by throwanem
4/13/2026 at 7:09:58 AM
I am so old :(I looked up “mogging” and I’d think “my assumptions about stuff are valid because I’m a lawyer and don’t know what you do” would count more as mogging than “that doesn’t quite sound right, this is a conversation about something specific and not your general cleverness” but I’ve got a Benny Hill archive to get through
by jrflowers
4/13/2026 at 7:25:29 AM
Those are not assumptions on your interlocutor's part. You've embarrassed yourself quite badly, I'm afraid. I know you don't understand how, but that doesn't change the fact of it.by throwanem
4/13/2026 at 7:56:19 AM
> You've embarrassed yourself quite badly, I'm afraid.:( you are right. This isn’t the first time I’ve lost an argument because hours into a discussion somebody introduced “what if a billion dollars” or “magic amulet” or “ブルマの母” etc
by jrflowers
4/13/2026 at 7:59:57 AM
It's just a world you've never seen. Don't take it too personally.by throwanem
4/13/2026 at 8:13:18 AM
I appreciate your kindness. While I’ve got you, did you know that the Benny Hill show started in 1955 and a good chunk of what aired from then to 1969 was lost? There are a lot of fans that don’t even realize that what is sometimes labeled as season 1 is season 15! Crazy stuff!by jrflowers
4/13/2026 at 10:38:20 AM
I had not known that! In a similar vein, there exists an Alice in Wonderland-themed Muppet Show episode, starring Brooke Shields, which has had to be left out of home video releases due to so far unresolvable music licensing issues. Not quite totally lost, but somewhat hard to find!by throwanem
4/12/2026 at 4:17:46 PM
Boring corporate Ai will surely come, but hey, lets enjoy the wild west while it lasts. I am grateful to see Boris come here to address problems people face. I 100% sure nobody is making him - he has one of the coolest jobs in the world.by aenis
4/12/2026 at 4:23:54 PM
>he has one of the coolest jobs in the world.So that means we just eject any critical thinking when it comes to companies, especially where they is no liability or obligation for them (Boris or Anthropic) to be honest.
Other than 'trust'.
by Ucalegon
4/13/2026 at 9:37:21 AM
Don’t like Anthropic? Use a competing service. At this point the sheer volume of your commentary is not particularly complimentary to your own critical thinking skills. It’s not your job to correct the internet or to convince randoms of the rightness of your position. Of all the things in the world to be pissed at so insistently, this seems to be a pretty minor one.by phs318u
4/13/2026 at 2:44:13 AM
But the default 1M context window just rolled out a few weeks ago. If refreshing old sessions on 1M context windows is the problem, it's completely aligned with what Boris is saying.by bpodgursky
4/12/2026 at 3:10:40 PM
Why did this become an issue seemingly overnight when 1M context has been available for a while, and I assume prompt caching behavior hasn't changed?EDIT: prompt caching behavior -did- change! 1hr -> 5min on March 6th. I'm not sure how starting a fresh session fixes it, as it's just rebuilding everything. Why even make this available?
It feels like the rules changed and the attitude from Anth is "aw I'm sorry you didn't know that you're supposed to do that." The whole point of CC is to let it run unattended; why would you build around the behavior of watching it like a hawk to prevent the cache from expiring?
by mvkel
4/12/2026 at 3:20:08 PM
> 1hr -> 5min on March 6thThis is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
by bcherny
4/12/2026 at 3:28:10 PM
https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)by throwdbaaway
4/12/2026 at 4:08:08 PM
Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).by fluidcruft
4/12/2026 at 8:47:25 PM
Then my original question stands: why did this become an issue seemingly overnight if nothing changed?by mvkel
4/12/2026 at 3:36:31 PM
So if I run a test suite or compile my rust program in a sub agent I’m going to get cache misses? Boo.by aaronblohowiak
4/12/2026 at 5:04:49 PM
Sub agents don't have much context and don't stay around for long, so misses in that case are trivial.by skeledrew
4/12/2026 at 10:55:38 PM
As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).by HumanOstrich
4/13/2026 at 12:16:33 AM
how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.by aaronblohowiak
4/12/2026 at 7:34:02 PM
... so how do API users enable 1hr caching? I haven't found a setting anywhere.by highd
4/12/2026 at 10:58:28 PM
would like to know this too ;Dthere is env.ENABLE_PROMPT_CACHING_1H_BEDROCK - but that is - as the name says "when using Bedrock"
for the raw API the docs are also clear -> "ttl": "1h" https://platform.claude.com/docs/en/build-with-claude/prompt...
but how to make claude-code send that when paying by API-key? or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)
by g4cg54g54
4/12/2026 at 3:12:26 PM
For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.
It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
by rawicki
4/12/2026 at 3:15:12 PM
I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.by bcherny
4/12/2026 at 3:17:03 PM
I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6by rawicki
4/12/2026 at 3:21:37 PM
That's weird. Would you mind running /feedback and sharing the id here next time you see this? I'd love to debugby bcherny
4/13/2026 at 9:43:27 AM
Same. Will run it too when I next get it.by ElFitz
4/12/2026 at 3:55:00 PM
Sure, I really appreciate you looking at this.a6edd0d1-a9ed-4545-b237-cff00f5be090 / https://github.com/anthropics/claude-code/issues/47027
I'm happy to provide any other info that can be useful (as long as i'm not sharing any information about the code or tools we use into a public github issue).
by rawicki
4/12/2026 at 4:54:52 PM
Thanks for the report! This was fixed in v2.1.92.Please:
1. Upgrade to the latest: claude update (seems like you did this already)
2. Start a new conversations (resuming an old convo may trigger this bug again in that convo)
by bcherny
4/12/2026 at 7:39:00 PM
This is bloody great Boris. Thank you.by egamirorrim
4/12/2026 at 4:37:58 PM
Thank you! Lookingby bcherny
4/12/2026 at 3:43:32 PM
I’ve seen this a couple of times recently. Including right after compact. I’ll /feedback it next time I see itby obrajesse
4/12/2026 at 3:29:28 PM
I've been using CC a decent amount the past few weeks and have never seen this malware stanza...?by bavell
4/12/2026 at 3:33:10 PM
1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?2. Can we pay more/do more rigorous KYC to disable it if it's active?
by echelon
4/12/2026 at 3:39:11 PM
This warning is not enabled for modern models. No action needed. I'm digging into the report above as soon as they're able to /feedback.by bcherny
4/12/2026 at 4:03:22 PM
The /clear nudge isn't a solution though. Compacting or clearing just means rebuilding context until Claude is actually productive again. The cost comes either way. I get that 1M context windows cost more than the flat per-token price reflects, because attention scales with context length, but the answer to that is honest pricing or not offering it. Not annoying UX nudges. What’s actually indefensible is that Claude is already pushing users to shrink context via, I presume, system prompt. At maybe 25% fill: “This seems like a good opportunity to wrap it up and continue in a fresh context window.”
“Want to continue in a fresh context window? We got a lot of work done and this next step seems to deserve a fresh start!”
If there’s a cost problem, fix the pricing or the architecture. But please stop the model and UI from badgering users into smaller context windows at every opportunity. That is not a solution, it’s service degradation dressed as a tooltip.
by j-pb
4/13/2026 at 4:03:41 AM
The cost issues they're seeing (at least from what they've stated) are from users, not internally. Basically, it takes either $5 or $6.25 (depending on 5m or 1h ttl) to re-ingest a 1M context length conversation into cache for opus 4.6, that's obviously a very high cost, and users are unhappy with it.I think 400k as a default seems about right from my experience, but just having the ability to control it would be nice. For the record, even just making a tool call at 1M tokens costs 50 cents (which could be amortized if multiple calls are made in a round), so imo costs are just too high at long context lengths for them to be the default.
by foota
4/13/2026 at 2:23:47 AM
currently "clear makes it worse" https://github.com/anthropics/claude-code/issues/47098 + https://github.com/anthropics/claude-code/issues/47107launching with `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 claude "Hello"` till those are fixed seems to be th way
by g4cg54g54
4/12/2026 at 3:12:56 PM
I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"by throwaway2027
4/13/2026 at 5:17:53 AM
I don’t want a scary red message chastising me for not being responsive enough!I often leave CC hanging (or even suspended) and use /resume a lot. I’m okay with that having some negative effect on my token limits.
Product design is hard. They can’t please us all. I don’t envy the team considering these trade offs.
by senko
4/13/2026 at 10:22:08 AM
Is it that hard though? This kinda smacks of no research on users prior to rolling stuff out.by sharts
4/12/2026 at 3:20:44 PM
Ack, it is currently blue but we can make it redby bcherny
4/12/2026 at 4:40:21 PM
Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.by SpaceNoodled
4/12/2026 at 4:47:40 PM
let me put this way: not your ram, not your cache, not waiting patiently for your input.by GeoAtreides
4/13/2026 at 1:57:09 AM
Good thing they didn't silently, quietly change cache from 1 hour to 5 minutes, right?by subscribed
4/12/2026 at 10:48:36 PM
Good thing they're not charging for it, then.by SpaceNoodled
4/13/2026 at 6:32:27 AM
forget the warning, just compact like someone suggested in the ticket. Who would opt for a massive cache miss?by smrtinsert
4/12/2026 at 3:26:14 PM
Hey Boris - why is the best way to get support making a Hacker News or X post, and hoping you reply? Why does Anthropic Enterprise Support never respond to inquiries?by avree
4/12/2026 at 7:45:50 PM
I mean if we're building an unrelated wishlist... Can 20x max users get auto mode already? Or can the enterprise plans get something equivalent to 20x max?Given I'm running two max accounts to get the usage I want, can we get a 25x and 40x tier? :-)
by egamirorrim
4/12/2026 at 3:04:07 PM
OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.
by denysvitali
4/12/2026 at 3:08:25 PM
Can you clearly state what they messed up?by weird-eye-issue
4/13/2026 at 5:33:58 AM
Suddenly burning up the quota ~4x faster than usual is not a mess up in your opinion?by tigershark
4/13/2026 at 5:41:47 AM
It is not inherently their fault though because usage is controlled both by the user and the harness behavior. So I was asking specifically what about the harness was messed up, can you provide that info?by weird-eye-issue
4/13/2026 at 9:09:55 AM
It's all there, including the specific version regression, unearthed bugs, workarounds: https://github.com/anthropics/claude-code/issues/45756by subscribed
4/13/2026 at 6:17:59 AM
You cannot reset usage across millions of users based off these AI slop reports.by yokoprime
4/13/2026 at 9:09:15 AM
LOL, funny how you're so happy to dismiss dozens of reports with hard data, and confirmed by the Claude Code team member.Issue with the confirmation: https://github.com/anthropics/claude-code/issues/45756
Looks like you have an axe to grind and facts be damned? :D
by subscribed
4/12/2026 at 3:57:10 PM
Not parent but I can guess from watching mostly from the sidelines.They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.
Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.
I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.
If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.
by nodja
4/12/2026 at 4:54:45 PM
I still love Claude and nothing but a ton of respect for Boris and the team building such a phenomenal product.That said, I feel that things started to feel a bit off usage-wise after the introduction of 1M context.
I'd personally be happy to disable it and go back to auto-compacting because that seems to have been the happy medium.
by atonse
4/12/2026 at 4:37:14 PM
Especially since Codex faced the same issue but the team decided to explicitly default to only ~200k context to avoid surprises and degradation for users.by logicchains
4/12/2026 at 3:07:01 PM
[flagged]by losteric
4/12/2026 at 3:21:45 PM
Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...
by mlinsey
4/12/2026 at 3:13:57 PM
Me and my colleagues faced, over the last ~1 month or so, the same issues.With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.
The only user error I see here is still trusting Anthropic to be on the good side tbh.
If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90
by denysvitali
4/12/2026 at 3:18:11 PM
> past 256k it looks like your usage consumes your limits fasterThis is false. My guess is what is happening is #1 above, where restarting a stale session causes a 256k cache miss.
That said, I hear the frustration. We are actively working on improving rate limit predictability and visibility into token usage.
by bcherny
4/12/2026 at 4:54:56 PM
just like everybody else I and my colleagues at work have seen major regressions in terms of available usage over the past month, seemingly unrelated to caching/resuming. On an enterprise sub doing the same work I personally went from being able to have several sessions running concurrently without hitting limits, to only having one session at a time and hitting my 5h every day twice a day in 3-4 hours tops (and due to the apparent lower intelligence I have been at the terminal watching what opus is doing like a hawk, so it's not a I went for coffee I have to hit the cache). The first day I ever hit my 5h this year was the day everybody reported it (I think it was the Monday you introduced the 2x promotion after hours? not sure, like 3 weeks ago?)To avoid 1M issues, this week I have also intentionally used the 256k context model, disabled adaptive thinking and did the same "plans in multiple short steps with /clear in-between" to minimize context usage, and yet nothing helps. It just feels ~2x to ~3x less tokens than before, and a lot less smart than in February.
Nowadays every time I complete a plan I spend several sessions afterwards saying things like "we have done plan X, the changes are uncommitted, can you take a look at what we did" and every time it finds things that were missed or outright (bad) shortcuts/deviations from plan despite my settings.json having a clear "if in doubt ask the user, don't just take the easy way out". As a random data point, just today opus halfway through a session told me to make a change to code inside a pod then rollout restart it to use said change, and when called out on it it of course said that I was right and of course that wouldn't work...
It is understandable that given your incredible growth you are between a rock and a hard place and have to tweak limits, compute does not grow on trees, but the consistent "you are holding it wrong" messaging is not helpful. I am wondering if realistically your only option is to move everybody to metered, with clear token usage displayed, and maybe have pro/max 5/max 20 just be a "your first $x of tokens is 50/75% off". Allow folks to tweak the thinking budget, and change the system prompt to remove things like "try the easy solution first" which anecdotally has been introduced in the past while, and allow users to verify on prompt if the prompt would cause the whole context to be sent or if cache is available.
by tetraodonpuffer
4/12/2026 at 3:11:19 PM
Why did it suddenly become an issue, despite prompt caching behavior being unchanged?by mvkel
4/12/2026 at 3:15:17 PM
PEBKAC: Problem Exists Between Keyboard And Chairby ScoobleDoodle
4/12/2026 at 3:28:23 PM
Yes same here. I use CC almost constantly every day for months across personal and work max/team accounts, as well as directly via API on google vertex. I have hardly ever noticed an issue (aside from occasional outages/capacity issues, for which I switch to API billing on Vertex). If anything it works better than ever.by extr
4/12/2026 at 4:15:03 PM
You know that people are not using the same resources? It's like 9 out of 10 computers get borked and you have the 1 that seems okay and you essentially say "My computer works fine, therefore all computers work fine." Come on dude.by varispeed
4/12/2026 at 3:06:45 PM
Money money money moneyby Madmallard
4/13/2026 at 10:11:33 AM
/loop message ping every 4 minutes@bcherny This will keep the cache warm while the REPL is not active ?
by richardjennings
4/12/2026 at 5:50:35 PM
> Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context insteadI don’t understand this. I frequently have long breaks. I never want to clear or even compact because I don’t want to lose the conversations that I’ve had and the context. Clearing etc causes other issues like I have to restate everything at times and it misses things. I do try to update the memory which helps. I wish there was a better solution than a time bound cache
by yumraj
4/12/2026 at 7:09:16 PM
Makes me wish that shortly before the server-side expiration, we could save the cache on the client-side, indefinitely.But my understanding is that we're talking about ~60GB of data per session, so it sounds unrealistic to do...
by cowwoc2020
4/13/2026 at 12:09:00 AM
Where are you getting 60GB from? It shouldn’t be that large.But yes, would love to save context/cache such that it can be played back/referred to if needed.
/compact is a little black box that I just have to trust that is keeping the important bits.
by yumraj
4/13/2026 at 2:07:32 AM
The KV cache consists of activation vectors for every attention head at every layer of the model for every token, so it gets quite large. ChatGPT also estimates 60-100GB for full token context of an Opus-sized model:https://chatgpt.com/share/69dc5030-268c-83e8-92c2-6cef962dc5...
by davmre
4/13/2026 at 2:10:42 AM
I suspect 1M token context is questionable value because of the secondary effect of burning quota vs getting work done.I think the model select that let me choose 1M made sense because I could decide if I was working on large documents and compacting more often was more effective.
by sunir
4/12/2026 at 3:14:36 PM
Would it be possible to increase the cache duration if misses are a frequent source of problems?Maybe using a heartbeat to detect live sessions to cache longer than sessions the user has already closed. And only do it for long sessions where a cache miss would be very expensive.
by brokencode
4/12/2026 at 3:15:50 PM
Yes, we're trying a couple of experiments along these lines. Good intuition.by bcherny
4/12/2026 at 7:31:13 PM
> Since Claude Code uses a 1 hour prompt cache window for the main agentthis seems a bit awkward vs the 5 hour session windows.
if i get rate limited once, I'll get rate limited immediately again on the same chat when the rate limit ends?
any chance we can get some form of deffered cache so anything on a rate limited account gets put aside until the rate limit ends?
by 8note
4/12/2026 at 9:50:03 PM
Boris,Even if Anthropic is working in good faith to lower infrastructure costs, developers need more than 5 minutes to notice that CC completed a task, review its changes and ask it to merge. Only developers who do not review code changes can live with such a TTL...
Consider making this value configurable as the ideal TTL value is different for each person. If people are willing to pay more for 30 minutes TTL than 5 minutes, they should be able to.
by cowwoc2020
4/12/2026 at 6:17:55 PM
As another data point, I pay for Pro for a personal account, and use no skills, do nothing fancy, use the default settings, and am out of tokens, with one terminal, after an hour. This is typically working on a < 5,000 line code base, sometimes in C, sometimes in Go. Not doing incredibly complicated things.by apgwoz
4/13/2026 at 2:43:19 AM
Hi, thanks for Claude Code. I was wondering though if you'd considering adding a mode to make text green and characters come down from the top of the screen individually, like in The Matrix?by taspeotis
4/12/2026 at 10:11:03 PM
Hi Boris,Long term claude code user here. Is the first time i've had to setup a hook to codex to review claude output.
Is hallucinating like never before
Is missing key concepts/instructions in context like never before
Is writing bad code that will "pass test" much more. Before it use to try be critic and do good code, now it will try to hack test and bypass intructions for a green pass.
by jiwidi
4/12/2026 at 3:17:46 PM
Ah, so cache usage impacts rate limits. There goes the ”other harnesses aren’t utilizing the cache as efficiently” argument.by yummytummy
4/12/2026 at 3:19:05 PM
Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.by bcherny
4/12/2026 at 7:13:29 PM
I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.by simsla
4/12/2026 at 4:22:22 PM
Politely, no.- I wrote an extension in Pi to warm my cache with a heartbeat.
- I wrote another to block submission after the cache expired (heartbeats disabled or run out)
- I wrote a third to hard limit my context window.
- I wrote a fourth to handle cache control placement before forking context for fan out.
- my initial prompt was 1000 tokens, improving cache efficiency.
Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.
by beacon294
4/12/2026 at 3:33:01 PM
That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.by yummytummy
4/12/2026 at 3:43:32 PM
There were two issues with some other 3p harnesses:1. Poor cache utilization. I put up a few PRs to fix these in OpenClaw, but the problem is their users update to new versions very slowly, so the vast majority of requests continued to use cache inefficiently.
2. Spiky traffic. A number of these harnesses use un-jittered cron, straining services due to weird traffic shape. Same problem -- it's patched, but users upgrade slowly.
We tried to fix these, but in the end, it's not something we can directly influence on users' behalf, and there will likely be more similar issues in the future. If people want to use these they are welcome to, but subscriptions clients need to be more efficient than that.
by bcherny
4/12/2026 at 4:43:15 PM
How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).by SyneRyder
4/12/2026 at 6:19:01 PM
There's prior art from Claude's own scheduled tasks' jitter: https://code.claude.com/docs/en/scheduled-tasks#jitter> Recurring tasks fire up to 10% of their period late, capped at 15 minutes. An hourly job might fire anywhere from :00 to :06.
> One-shot tasks scheduled for the top or bottom of the hour fire up to 90 seconds early.
by Deathmax
4/12/2026 at 5:03:25 PM
If you give doll a list of things you want to see from third party harnesses, a compliance checklist it will make sure the one it is building follows it to the letter.by dollspace
4/12/2026 at 3:48:44 PM
I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.
by eastbound
4/13/2026 at 2:28:43 AM
I’ve seen the /clear command prompt and I found the verbiage to be a bit unclear. I think clarifying that the cache has expired and providing an understandable metric on the impact - ie “X% of your 5-hour window” for Pro/Mad users and details on token use for API users. A pop-up that requires explicit acknowledgment might also help, although that could be more of an annoyance to enterprise users.One pattern I use frequently is using one high level design and implementation agent that I’ll use for multiple sessions and delegate implementation to lower level agents.
In this case it’d be helpful to have one of two options:
1. If Claude CLI could create an auto compaction of the conversation history before cache expiration. For example, if I’m beyond X minutes or Y prompts in a conversation and I’ve been inactive for a threshold it could auto-compact close to the expiration and provide that as an option on resume. 2. If I could configure cache expiration proactively and Anthropic could use S3 or a similar slow load mechanism to offload the cache for a longer period - possibly 24-72h.
I can appreciate that longer KV cache expiration would complicate capacity management and make inference traffic less fungible but I wouldn’t mind waiting seconds to minutes for it to load from a slower store to resume without quota hits.
by anoazian
4/12/2026 at 3:45:47 PM
You've created quite a conundrum.The only people who are going to run into issues are superpower users who are running this excessively beyond any reasonable measure.
Most people are going to be quite happy with your service. But at the same time, and this is just a human nature thing people are 10 times more likely we complain about an issue than to compliment something working well.
I don't know how to fix this, but I strongly suspect this isn't really a technical issue. It's more of a customer support one.
by 999900000999
4/12/2026 at 3:41:05 PM
Have you considered poking the cache?When a user walks away during the business day but CC is sitting open, you can refresh that cache up to 10x before it costs the same as a full miss. Realistically it would be <8x in a working day.
by samuelknight
4/12/2026 at 3:08:53 PM
Am I so out of touch?No! It’s the children who are wrong!
by fps-hero
4/13/2026 at 9:07:13 AM
Do you people even read the source code written by Claude Code? Do you even test the same system prompts used by your product? Do you use Claude Code to find bugs and to come up with improvements?Why are the models quantized or dumbed down? How can you possibly expect to reduce the load on your infrastructure when 5-20x more turns are required to get the same results or even worse results than a single turn used to? It currently fails to reason about very basic things even when they're explained to it and it's told what to do.
There's a silver lining: your heavy load problems will go away as you lose more customers. You'll barely have any load in the near future.
Here are a few ideas for you to try: make CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 the default, improve the system prompt to make it better, improve the prompts for the agents, stop serving quantized models. Put in place comprehensive test suites for Claude Code, for the infrastructure involved in serving the code and for the models themselves. Have the extremely hyped Mythos analyze your code base to fix more bugs. Stop shipping so many new features. Stop shipping releases without changelogs. Make it possible to delete all the connectors from the dashboard and to disable all connectors. Make it possible to disable/enable MCP tools with dots in them (claude.ai MCP tools can't be disabled by denying them). Do research to improve the performance of your platform.
Drop the ridiculous requirement to use the subscription only with Claude Code. Restrict it instead to coding and related tasks (no personal agents like Claw and related) if you want to get rid of the high load caused by autonomous or scheduled jobs.
Make the quotas transparent with actual numbers, not percentages.
Communicate proactively. Fix bugs. Improve the product. Stop degrading the service provided to customers.
by foofloobar
4/13/2026 at 9:11:56 AM
This comment seems unnecessarily hostile.by stingraycharles
4/13/2026 at 9:32:28 AM
Why?It seems just fine to me. This is what Anthropic needs to do if they want to survive. I'm always looking out for someone to integrate an actually good harness to a good model. Once that happens, I'm jumping ship if Anthropic keeps playing these tricks.
It's almost unusable for me now. A simple prompt to merge 3 sub-100-line files with simple node code, on Sonnet 4.6, uses up 20% of my 5 hour quota, on a new/fresh session.
by prmph
4/13/2026 at 9:41:36 AM
To be fair, my comment was a bit harsher before the update. The way they handle the development, communication and how they treat customers isn't fine. I've seen some angry people post and comment in manners which truly deserved the label hostile.The whole product with the infrastructure and Claude Code's code appear to be vibe coded.
by foofloobar
4/13/2026 at 10:36:39 AM
If they can’t infrastructure then perhaps they should offer the ability for customers to host themselves.by sharts
4/13/2026 at 10:33:28 AM
The hostility is all Anthropic.by sharts
4/13/2026 at 9:18:36 AM
They appear to take issues seriously mostly when they become posts on hacker news and when articles are published online by major news sites. Customer support is mostly a bot. I don't even know how to reach some actual humans to get support.I'm sorry if you and others are offended. They've had these issues for several weeks now. I haven't seen any real improvements during this time. I see more features and more bugs.
There have been several releases made over the last few days without any changelogs. The quotas are still as opaque as they've been. This company has some extremely shady business practices.
by foofloobar
4/12/2026 at 4:27:25 PM
Could we get an option to use Opus with a smaller context window? I noticed that results get much worse way earlier than when you reach 1M tokens, and I would love to have a setting so that I could force a compaction at eg 300k tokens.by danmaz74
4/12/2026 at 4:35:22 PM
You probably just missed it in his post, but:"To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude."
Maybe try changing the 4 to a 3 and see if that works for you?
by SyneRyder
4/13/2026 at 5:42:44 AM
Thank you, will definitely try that!by danmaz74
4/13/2026 at 1:48:22 AM
Thank you for your responses, especially on a Sunday. They give us some insights and at least a couple temporary workarounds to use, while the issues are being addressed :) much appreciatedby cmaster11
4/12/2026 at 5:00:39 PM
> defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferredThis seems really useful!
I'm surprised that "Opus 4.6" (200K) and "Opus 4.6 1M" are the only Opus options in the desktop app, whereas in the CLI/TUI app you don't seem to even get that distinction.
I bet that for a lot of folks something like 400k, 600k or 800k would work as better defaults, based on whatever task they want to work on.
by KronisLV
4/12/2026 at 3:46:05 PM
Boris, wasnt this the same thing ~2 weeks ago? Is it the same cache misses as before? What's the expected time till solved? Seems like its taking a whileby ramon156
4/12/2026 at 5:01:21 PM
Resizing the context window seems like a very good idea to me. I noticed a decline of productivity when the 1M context window was released and I'd like to bring it back to 200k, because it was totally fine for the things I was working on.by ahofmann
4/12/2026 at 8:16:47 PM
shouldn't compaction be interactive with the user as to what context will continue to be the most relevant in the future??? what if the harness allowed for a turn to clarify the user's expected future direction of the conversation and did the consolidation based upon the addition info?there definitely seems to be a benefit to pruning the context and keeping the signal to noise high wrt what is still to be discussed.
by mmd45
4/13/2026 at 5:24:46 AM
Claude Code cache is not 1 hour. There is a "Closed as not planned" issue in GitHub that confirms that it has been moved to 5 minutes since March: https://github.com/anthropics/claude-code/issues/46829. I started seeing the massive degradation exactly on the 23rd of March, hence after a few days I unsubscribed because it was completely unusable, with a ~5h session being depleted in as little as 15-20 mins.by tigershark
4/13/2026 at 8:00:24 AM
Looks like the cache change to 5 minutes was so secretive that even CC team doesn't know that.Or someone just vibe coded "Hey, Claude, make them burn allowances quicker" and merged without telling anyone.
Both are plausible to me.
by subscribed
4/12/2026 at 6:24:26 PM
Have you tried asking Mythos for a fix?by throwpoaster
4/12/2026 at 3:31:15 PM
Where can i learn about concepts like prompt cache misses? I don't have a mental model how that interacts with my context of 1M or 400k tokens... I can cargo cult follow instructions of course but help us understand if you can so we can intelligently adapt our behavior. Thanks.by hughw
4/12/2026 at 3:44:43 PM
The docs are a good place to start: https://platform.claude.com/docs/en/build-with-claude/prompt...by bcherny
4/12/2026 at 4:01:30 PM
Thanks. Just noting that those docs say the cache duration is 5 min and not 1 hour as stated in sibling comment:> By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used. > > If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.
by snthpy
4/12/2026 at 8:03:53 PM
Apparently Anthropic downgraded cache TTL to 5 min without telling anyone. My biggest issue with the recent issues with Claude Code is the lack transparency, although it looks like even Boris doesn't know about one: https://news.ycombinator.com/item?id=47736476by yoaviram
4/12/2026 at 3:32:45 PM
And why does /clear help things? Doesn't that wipe out the history of that session? Jeez.by hughw
4/13/2026 at 10:32:11 AM
[dead]by CWwdcdk7h
4/12/2026 at 3:29:17 PM
I have a feature request: I build an mcp server, but now it has over 60 tools. Most sessions i really don’t need most of them. I suppose I could make this into several servers. But it would maybe be nice to give the user more power here. Like let me choose the tools that should be loaded or let me build servers that group tools together which can be loaded. Not sure if that makes sense …by _fizz_buzz_
4/13/2026 at 6:40:00 AM
Number 2 makes me chuckle honestly. Too many people going down the 10x rabbit holes on youtube. Next up, a framework that 100xs your workflow. You know its good because it comes with 300 agents and 20 mcp servers and 1200 skillsby smrtinsert
4/12/2026 at 6:03:12 PM
from looking at the raw requests, that cant seem right?its all "cache_control": { "type": "ephemeral" } there is no "ttl" anywhere.
// edit: cc_version=2.1.104.f27
by g4cg54g54
4/12/2026 at 3:59:15 PM
> To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session)Is this really an improvement? Shouldn't this be something you investigate before introducing 1M context?
What is a long stale session?
If that's not how Claude Code is intended to be used it might as well auto quit after a period of time. If not then if it's an acceptable use case users shouldn't change their behavior.
> People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins.
If this was an issue there should have been a cap on it before the future was released and only increased once you were sure it is fine? What is "a large number"? Then how do we know what to do?
It feels like "AI" has improved speed but is in fact just cutting corners.
by re-thc
4/12/2026 at 3:56:29 PM
Hello Boris! How do I increase the 1 hour prompt cache window for the main agent? I would love to be able to set that to, say, 4 hours. That gives me enough time to work on something, go teach a class, grab a snack, and come back and pick up where I left off.by earino
4/13/2026 at 9:02:10 AM
Another CC team member confirmed it's 5 minutes now, not 1 hour.See the links in https://news.ycombinator.com/item?id=47747209
by subscribed
4/12/2026 at 3:53:35 PM
How can we turn of 1m context? I don't find it has ever helped.by fluidcruft
4/12/2026 at 4:41:19 PM
He mentioned this in his original comment:"CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000"
by mwigdahl
4/12/2026 at 3:19:57 PM
Why are you all of a sudden running into so many issues like this? Could it be that all of the Anthropics employees have completely unlimited and unbounded accounts, which means you don't get a feeling of how changes will affect the customers?by docheinestages
4/12/2026 at 3:24:00 PM
The number of people using Claude Code has grown very quickly, which means:- More configurations and environments we need to test
- Given an edge/corner case, it is more likely a significant number of users run into it
- As the ecosystem has grown, more people use skills and plugins, and we need to offer better tools and automation to ensure these are efficient
We do actually dogfood rate limits, so I think it's some combination of the above.
by bcherny
4/13/2026 at 10:39:44 AM
How do ya’ll test?by sharts
4/12/2026 at 10:18:59 PM
I think the suspicion regarding skills and plugins is fair and logical. And it is absolutely the case that some use significantly more tokens.with that said, on my 5x plan, I could have multiple sessions working and the limit was far away. Around when you introduced the whole more tokens during off-peak hours and fewer tokens during working US hours, Even with a single session, using no plugins at all (I uninstalled OMC) I run into limits very often.
I have not performed any rigorous tests but it feels like I have about 25% of what I used to have or less. This is all without using teams of agents, or ralph loops or anything like that. Just /plan and execute in a single session. I have restored the /clear context before executing plan to try and mitigate things. I will also try the 400k context since, in my experience, the 1M tokens have not made Opus 4.6 noticeably smarter for my small webapp use-case.
Best of luck to you!
ps: whenever you introduce a change, please make it optional AND ask the user about it at first. Don't just yank things suddenly (like the /clear context and apply plan option.) as I spent hours trying to figure out how I broke it before I saw your note and how to re-enable it.
by gozucito
4/12/2026 at 6:19:23 PM
Because it’s completely vibe coded? And the codebase goes through massive churn, which means things that were stable get rewritten possibly with bugs.by nothinkjustai
4/12/2026 at 7:48:53 PM
You can get Claude Code to write tests too...by egamirorrim
4/12/2026 at 5:16:01 PM
There's an issue someone raised showing that prompt caches are only 5 minutes.The reply seems to be: oh huh, interesting. Maybe that's a good thing since people sometimes one-shot? That doesn't feel like the messaging I want to be reading, and the way it conflicts with the message here that cache is 1 hour is confusing.
https://news.ycombinator.com/item?id=47741755
Is there any status information or not on whether cache is used? It sure looks like the person analyzing the 5m issue had to work extremely hard to get any kind of data. It feels like the iteration loop of people getting better at this stuff would go much much better if this weren't such a black box, if we had the data to see & understand: is the cache helping?
by jauntywundrkind
4/13/2026 at 3:22:47 AM
Aren’t they saying that it’s 5minutes for things like subagents (that wouldn’t benefit from it?)by TheTaytay
4/12/2026 at 3:18:00 PM
Pulling all the skills and agents in the world in, when unused are a big hit. I deleted all of mine and added back as needed and there was an improvement.Running Claude Cowork in the background will hit tokens and it might not be the most efficient use of token use.
Last, but not least, turning off 1M token context by default is helpful.
by j45
4/12/2026 at 3:41:37 PM
Eh you say that every time and yet it keeps happening.by dkersten
4/12/2026 at 4:44:31 PM
Boris, is the KV cache TTL now reduced to 5 minutes from 1 hour?I think this may be the biggest concern for people building tools on the API: https://github.com/anthropics/claude-code/issues/46829
I would argue that KV caching is a net gain for Ant and a well-maintained cache is the biggest thing that can generate induced demand and a thriving third party ecosystem. https://safebots.ai/papers/KV.pdf
by EGreg
4/12/2026 at 4:04:03 PM
Can you explain why Opus 4.6 suddenly becomes dumb as a sack of potatoes, even if context is barely filled?Can you explain why Opus 4.6 will be coming up with stupid solutions only to arrive at a good one when you mention it is trying to defraud you?
I have a feeling the model is playing dumb on purpose to make user spend more money.
This wasn't the case weeks ago when it actually working decently.
by varispeed
4/12/2026 at 6:13:37 PM
Wait what? If I get told to come back in three hours because I'm using the product too much, I get penalized when I resume?What's the right way to work on a huge project then? I've just been saying "Please continue" -- that pops the quota?
by throwpoaster
4/12/2026 at 4:26:49 PM
I wish people would pay more attention to:* Anthropic is in some way trying to run a business (not a charity) and at least (eventually?) make money and not subsidize usage forever
* "What a steal/good deal" the $100-$200/mo plans are compared to if they had to pay for raw API usage
and less on "how dare you reserve the right to tweak the generous usage patterns you open-ended-ly gave us, we are owed something!"
by MuffinFlavored
4/12/2026 at 5:22:55 PM
As an (ex) paying customer, I'm expecting some consistency. I used to be satisfied with the value I got, until the limits changed overnight, and I'd get a ten of my previous usage.If Anthropic is allowed to alter the deal whenever, then I'd expect to be able to get my money back, pro-rata, no questions asked.
by lbreakjai
4/12/2026 at 4:41:51 PM
All those apply to OpenAI+Codex too, but they're far more generous with limits than Anthropic, and with granting fresh limits to apologize when they fuck up.by logicchains
4/13/2026 at 2:18:36 AM
[flagged]by oskarw85
4/12/2026 at 4:24:17 PM
[flagged]by accounting2026