2/23/2026 at 1:40:13 PM
> Perhaps someone at their end screwed up a loop conditional, but you'd think some monitoring dashboard somewhere would have a warning pop up because of this.If you've been in any big company you'll know things perpetually run in a degraded, somewhat broken mode. They've even made up the term "error budget" because they can't be bothered to fix the broken shit so now there's an acceptable level of brokenness.
by Nextgrid
2/23/2026 at 2:35:29 PM
>they can't be bothered to fix the broken shitSurely it's more likely that it's just cheaper to pay for the errors than to pay to fix the errors.
Why fix 10k worth of errors if it'll cost me 100k to fix it?
by goodmythical
2/23/2026 at 11:18:56 PM
In my opinion, if something isn’t actually an error, you modify your logging to not log it as an error. Your error logging/alerting pipeline should always stay clean.If something shows up in there, you should only have 2 options: 1) it’s an actual error and you fix it and make sure it never happens again, or 2) it’s not an error and then you fix it by adjusting the log level to make sure it isn’t one.
If someone suggests an “error budget” on my watch they get the door. You can have a warning budget (and the resources to adjust the log levels or remediation protocols to fix said “errors”) but actual errors should remain errors - otherwise they’re delivering broken software and that’s not what I’m paying them for.
Of course, companies who have the common sense to do this already do it and nobody in their right mind would suggest an “error budget”, but for those that don’t they have a serious problem that needs to be rectified.
The danger otherwise is that you’re making your observability pipeline useless if “errors” no longer actually mean errors. That’s really bad because now it opens the door to actual errors being ignored until it’s too late and then remediation is more costly.
by Nextgrid
2/23/2026 at 2:46:13 PM
The orgs are not ruthless like that, anything less than a certain % of the org revenue is not worth bothering unless it creates _more_ work to the person responsible for it than fixing it does.Add some % if person who gets more work from the problem is not the same as the person who needs to fix it. People will happily leave things in a broken state if no one calls them out on it.
by DanielHB
2/23/2026 at 5:26:04 PM
At Facebook a full outage is accompanied by "first time?" Memes. Unless you are on the specific team responsible you would indeed not really have any reason to careby darepublic
2/23/2026 at 3:08:15 PM
In my 3rd year of enterprise now and learned that there are many engineers who will purposefully not fix/improve their problematic applications as a weird sort of job security. It kind of blew up in their faces last year when we moved most of the affected on-premise applications to cloud. Seems like when you introduce tons of friction on-premise it makes the cloud look even better to the suits.by nazgulsenpai
2/23/2026 at 4:01:08 PM
It's not a matter of "can't be bothered." Engineers are constantly fixing things and rolling out new features. "Error budgets" are an acknowledgement of the tradeoff between these two things, and making a conscious choice about the balance between them, according to the business requirements of the application in question.Keep in mind that "fixing things" is essentially a Sisyphean task - no matter how much you do there's always more you can do. Just like adding features. You have to have some kind of guideline on when enough is enough.
by Nifty3929
2/23/2026 at 2:05:16 PM
[dead]by nine_zeros