Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit breaker on max Alert count for Bucket-Level Monitors #169

Open
qreshi opened this issue Sep 8, 2021 · 0 comments
Open

Circuit breaker on max Alert count for Bucket-Level Monitors #169

qreshi opened this issue Sep 8, 2021 · 0 comments
Labels
bucket-level alerting Item related to the Bucket-Level Alerting feature enhancement New feature or request

Comments

@qreshi
Copy link
Contributor

qreshi commented Sep 8, 2021

Is your feature request related to a problem? Please describe.
Currently when retrieving existing Alerts for Bucket-Level Monitors, the query is limited to a size of 500. If the existing Alerts exceeds this number, then some Alerts might be missed on subsequent executions.

Describe the solution you'd like
This size of the Alerts being retrieved should not be hardcoded but instead be tied to some limit of the maximum number of Alerts we expect to allow for the Bucket-Level Monitor. This can be achieved by adding a circuit breaker on the Alert count which can be defined as a dynamic configurable setting.

Describe alternatives you've considered
Some options on implementation are described below.

Additional context
Should the circuit breaker be defined as a limit on the Alert count at the Monitor or Trigger level? The size which is inputted into the search request to retrieve the Alerts during the Monitor execution is retrieving all current Alerts at the Monitor level.

If the circuit breaker was defined at the Monitor level, then that value can be passed into size directly and we'd keep a moving counter of the Alerts (so any deduped + new - completed?) as the Triggers are evaluated. The complication here is what if we don't go over the limit until the last Trigger execution. This would be considered a Monitor level failure so even if the last Trigger caused the Alert count to go over the threshold, we'd fail at the Monitor level and would not reflect the updated Alert state. As is the case with any failure though, we would still execute the Actions for all Triggers to communicate the failure.

If the circuit breaker was defined at the Trigger level, then the value being passed into size could be per_trigger_limit * trigger_count to ensure we don't miss the retrieval of any Alerts. The benefit of this would be that we could evaluate the circuit breaker on each Trigger independently during execution so if a single Trigger exceeds the per-Trigger Alert limit, then only that Trigger fails and the other ones are unaffected for that Monitor.

@qreshi qreshi added enhancement New feature or request bucket-level alerting Item related to the Bucket-Level Alerting feature labels Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bucket-level alerting Item related to the Bucket-Level Alerting feature enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant