2/19/2026 at 2:21:23 PM
> they make per-instance decisions with per-instance stateBut this is a feature, not a bug. You seems to be assuming that people use circuit-breaks only on external requests, in this situation your approach seems reasonable.
If you have cbs between every service call your model doesn't seem a good idea. Where I work every network call is behind a cb (external services, downstream services, database, redis, s3, ...) and it's pretty common to see failures isolated in a single k8s node. In this situation we want to have independent cbs, they can open independently.
Your take on observability/operation seems interesting but it is pretty close to feature flags. And that is exactly how we handle these scenarios, we have a couple of feature flags we can enable to switch traffic around during outages. Switching to fallback is easy most of the time, but switching back to normal operation is harder to do.
by nzach
2/19/2026 at 2:38:16 PM
You're right, for intra-cluster calls where failures are scoped between the node itself and the infra around it, per-instance breakers are what you want. I wouldn't suggest centralizing those, and I might be wrong, but in most of these scenarios there is no fallback anyways (maybe except Redis?)Openfuse is aimed at the other case: shared external dependencies where 15 services all call the same dependency and each one is independently discovering the same outage at different times. Different failure modes, different coordination needs, and you have no way to manually intervene or even just see what's open. Think of your house: every appliance has its own protection system, but that doesn't exempt you from having the distribution board.
You can also put it between your service/monolith and your own other services, e.g. if a recommendations engine, or a loyalty system in an E-Commerce or POS softwares go down, all hotpath flows from all other services will just bypass their calls to it. So with "external" I mean another service, whether it's yours or from a vendor.
On the feature flag point: that's interesting because you're essentially describing the pain of building circuit breaker behavior on top of feature flag infrastructure. The "switching back" problem you mention is exactly what half-open state solves: controlled probe requests that test recovery automatically and restore traffic gradually, without someone manually flipping a flag and hoping. That's the gap between "we can turn things off" and "the system recovers on its own." But yeah, we can all call Openfuse just feature flags for resilience, as I said: it's a fusebox for your microservices.
Curious how you handle the recovery side, is it a feature flag provider itself? or have you built something around it and store in your own database?
by rodrigorcs
2/19/2026 at 5:00:01 PM
> where 15 services all call the same dependency and each one is independently discovering the same outage at different timesI don't really see what problem this solves. If you have proper timeouts and circuit breakers in your service this shouldn't really matter. This solution will save a few hundred requests, but I don't think this really matters. If this is a pain point its easier to adjust the circuit-breaker settings (reduce the error rate, increase the window, ...) than introduce a whole new level of complexity.
> Curious how you handle the recovery side
We have a feature flag provider built in-house. But it doesn't support this use-case, so what we done is to create flag where we put the % value we want to bring back and handle the logic inside the service. Example: if you want to bring back 6,25% (1/16) of our users this means we should switch back every user that has an account-id ending in 'a'. For 12.5% (2/16) we want users with account-id ending either in 'a' or 'b'. This is a pretty hacky solution, but it solves our problem when we need to transition from our fallback to our main flow.
by nzach
2/19/2026 at 5:20:15 PM
> I don't really see what problem this solves. If you have proper timeouts and circuit breakers in your service this shouldn't really matter.Each service discovering by their own is not really the main problem to be solved with my proposal, the thing is that by doing it locally, we lack observability and there is no way to act on them.
> what we done is to create flag where we put the % value we want to bring back
Oh I see, well that is indeed a good problem to solve. Openfuse does not do that gradual recovery but it would be possible to add.
Do you think that by having that feature and having the Openfuse solution self-hosted, it would be something you would give a try? Not trying to sell you anything, just gathering feedback so I can learn from the discussion.
By the way, if you don't mind, how often do you have to run that type of recovery?
by rodrigorcs
2/19/2026 at 6:34:05 PM
> Do you think that by having that feature and having the Openfuse solution self-hosted, it would be something you would give a try?No, I don't think this is compelling enough to try it at work.
> By the way, if you don't mind, how often do you have to run that type of recovery?
I would say we use this feature once every 3 months.
by nzach