Skip to content

[controller-utils] Reduce default circuit break duration #5887

Open
@Gudahtt

Description

@Gudahtt

The default circuit break duration is currently 30 minutes. This is a reasonably good value for the case where there is an outage, as it allows for a gradual recovery over 30 minutes that reduces risk of a sudden spike in traffic interfering with recovery efforts.

However, currently the possibility exists that the circuit could break in edge cases that are not outages. HTTP 5XX errors can trigger a circuit break because they often indicate an outage. But if a specific request returns an HTTP 5XX error, that can result in the circuit unintentionally being broken, essentially preventing all HTTP requests to that endpoint when really only one type of request is failing. This is something we can't completely prevent in the circuit break logic (though maybe there are opportunities to improve our heuristics) because it's not possible to differentiate the two types fo failure with complete accuracy.

To reduce the negative impact of that occurrence, we can reduce the circuit break duration to 5 minutes. This means that a user encountering this edge case would only be interrupted for 5 minutes rather than 30, a much more reasonable recovery time. It would also improve the speed of recovery in case of an outage, with the downside that it would be less gradual.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions