-
Notifications
You must be signed in to change notification settings - Fork 119
Update AlertService for Bucket-Level Alerting #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>
| suspend fun loadCurrentAlertsForBucketLevelMonitor(monitor: Monitor): Map<Trigger, MutableMap<String, Alert>> { | ||
| val searchAlertsResponse: SearchResponse = searchAlerts( | ||
| monitorId = monitor.id, | ||
| size = 500 // TODO: This should be a constant and limited based on the circuit breaker that limits Alerts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this constant defined here looks ugly, move somewhere else if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 500 size could be too big and consume big chunk of heap if individual bucket size is big. Will loading alerts in paginated way help here?
We have a requirement in MonitorRunner to load all alerts and then dedupe any existing ones. Can this process of dedupe/categorization be done in a paginated way? If we load all current alerts in an ordered way and also, paginate and keep discarding old pages to categorize the new alerts, wouldn't that help in limiting the memory here?
We don't have to fix it now, but something we can think about and probably create an issue for future optimization if you think its feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the reason this was a harcoded value and not defined as a constant was because it was arbitrary for the time being. We should discuss with @elfisher and see where we want to draw the cutoff.
The problem is that this size is limiting the Alerts that are being fetched at the Monitor level whereas any circuit breakers we add might make more sense at the Trigger level (since that's where we de-dupe). Imposing a circuit breaker in general to the max Alert count will require some refactoring in the MonitorRunner logic since currently it is indexing new/de-duped Alerts as it paginates composite agg results (so we wouldn't be able to circuit break halfway without having partial results unless we removed that behavior).
For the time being, I'll move this value of 500 to a constant variable and once all changes are in main, at the very least I'll add a warn log that this value has been exceeded at the Monitor level. If time permits, let's discuss a circuit breaker option (and if we are willing to fail the Monitor here). Either way, we will be creating an issue for this if it is not part of the initial release.
| .map { ActionExecutionResult(it.key, it.value.executionTime, if (it.value.throttled) 1 else 0) } | ||
| ) | ||
|
|
||
| val updatedErrorHistory = currentAlert.errorHistory.update(alertError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we always load full error history? can this be optimized as it can be occupy a big chunk of heap. Another possible optimization which can be added later if you think its a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like errorHistory is capped to 10 elements but I agree, we can revisit some of these operations to think through optimizations.
Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>
* Added release notes for OpenSearch 1.0.0.0. (#123) (#124) Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> * Add Integtest.sh for OpenSearch integtest setups (#121) * Add integtest script to the repo Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Add Alerting specific security param for integTest Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Remove default assignee (#127) Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> * Removing All Usages of Action Get Method Calls and adding the listeners (#130) Signed-off-by: Aditya Jindal <aditjind@amazon.com> * Fix snapshot build and increment to 1.1.0. (#142) Signed-off-by: dblock <dblock@amazon.com> * Refactor MonitorRunner (#143) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update Bucket-Level Alerting RFC (#145) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add BucketSelector pipeline aggregation extension (#144) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add AggregationResultBucket (#148) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add ActionExecutionPolicy (#149) * Add ActionExecutionPolicy Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Throw exception if there is an invalid field in PER_ALERT config when parsing Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (#150) * Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Require condition to not be null when parsing Bucket-Level Trigger Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update InputService for Bucket-Level Alerting (#152) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Update TriggerService for Bucket-Level Alerting (#153) * Update TriggerService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Remove client from TriggerService Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update AlertService for Bucket-Level Alerting (#154) * Update AlertService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Move Alert search size for Bucket-Level Monitors to a const Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add worksheets to help with testing (#151) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update MonitorRunner for Bucket-Level Alerting (#155) * Update MonitorRunner for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update regressed comment in MonitorRunnerIT Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add TODO to break down runBucketLevelMonitor method in MonitorRunner Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix ktlint formatting issues (#156) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Execute Actions on runTrigger exceptions for Bucket-Level Monitor (#157) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (#158) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return first page of input results in MonitorRunResult for Bucket-Level Monitor (#159) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (#161) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (#163) * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Change trigger after key conditionals to when statement Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (#164) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return only monitors for /monitors/_search. (#162) * Return only monitors for /monitors/_search. * Added missing imports * Added additional check to the unit test * Resolve default for ActionExecutionPolicy at runtime (#165) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org> Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>
…ledging more than 10 alerts at once. (#205) * Added release notes for OpenSearch 1.0.0.0. (#123) * Merge commits from the main branch to the 1.x branch. (#133) * Added release notes for OpenSearch 1.0.0.0. (#123) (#124) Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> * Add Integtest.sh for OpenSearch integtest setups (#121) * Add integtest script to the repo Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Add Alerting specific security param for integTest Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Remove default assignee (#127) Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> * Removing All Usages of Action Get Method Calls and adding the listeners (#130) Signed-off-by: Aditya Jindal <aditjind@amazon.com> * Fix snapshot build and increment to 1.1.0. (#142) Signed-off-by: dblock <dblock@amazon.com> * Refactor MonitorRunner (#143) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update Bucket-Level Alerting RFC (#145) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add BucketSelector pipeline aggregation extension (#144) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add AggregationResultBucket (#148) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add ActionExecutionPolicy (#149) * Add ActionExecutionPolicy Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Throw exception if there is an invalid field in PER_ALERT config when parsing Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (#150) * Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Require condition to not be null when parsing Bucket-Level Trigger Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update InputService for Bucket-Level Alerting (#152) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Update TriggerService for Bucket-Level Alerting (#153) * Update TriggerService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Remove client from TriggerService Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update AlertService for Bucket-Level Alerting (#154) * Update AlertService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Move Alert search size for Bucket-Level Monitors to a const Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add worksheets to help with testing (#151) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update MonitorRunner for Bucket-Level Alerting (#155) * Update MonitorRunner for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update regressed comment in MonitorRunnerIT Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add TODO to break down runBucketLevelMonitor method in MonitorRunner Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix ktlint formatting issues (#156) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Execute Actions on runTrigger exceptions for Bucket-Level Monitor (#157) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (#158) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return first page of input results in MonitorRunResult for Bucket-Level Monitor (#159) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (#161) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (#163) * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Change trigger after key conditionals to when statement Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (#164) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return only monitors for /monitors/_search. (#162) * Return only monitors for /monitors/_search. * Added missing imports * Added additional check to the unit test * Resolve default for ActionExecutionPolicy at runtime (#165) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org> Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com> * Add release notes for 1.1.0.0 release (#166) (#167) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Remove default integtest.sh. (#181) Signed-off-by: dblock <dblock@dblock.org> * Add valid search filters. (#191) * Add valid search filters. * Added this fix to release notes * Publish notification JARs checksums. (#197) Signed-off-by: dblock <dblock@dblock.org> * Also publish SHA 256 and 512 checksums. (#198) * Also publish SHA 256 and 512 checksums. Signed-off-by: dblock <dblock@dblock.org> * Remove sonatype staging. Signed-off-by: dblock <dblock@dblock.org> * Fixed a bug that was preventing the AcknowledgeAlerts API from acknowledging more than 10 alerts at once. Signed-off-by: Thomas Hurney <hurneyt@amazon.com> * Implemented integration tests to ensure fix for issue 203 is working as expected. Signed-off-by: Thomas Hurney <hurneyt@amazon.com> * Refactored integ tests based on PR feedback, and listed the bug fix in the release notes. Signed-off-by: Thomas Hurney <hurneyt@amazon.com> * Removing bug fixes from release notes. Currently discussing adding separate notes for this patch. Signed-off-by: Thomas Hurney <hurneyt@amazon.com> Co-authored-by: Aditya Jindal <13850971+aditjind@users.noreply.github.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org> Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>
…ject#133) * Added release notes for OpenSearch 1.0.0.0. (opensearch-project#123) (opensearch-project#124) Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> * Add Integtest.sh for OpenSearch integtest setups (opensearch-project#121) * Add integtest script to the repo Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Add Alerting specific security param for integTest Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Remove default assignee (opensearch-project#127) Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> * Removing All Usages of Action Get Method Calls and adding the listeners (opensearch-project#130) Signed-off-by: Aditya Jindal <aditjind@amazon.com> * Fix snapshot build and increment to 1.1.0. (opensearch-project#142) Signed-off-by: dblock <dblock@amazon.com> * Refactor MonitorRunner (opensearch-project#143) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update Bucket-Level Alerting RFC (opensearch-project#145) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add BucketSelector pipeline aggregation extension (opensearch-project#144) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add AggregationResultBucket (opensearch-project#148) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Add ActionExecutionPolicy (opensearch-project#149) * Add ActionExecutionPolicy Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Throw exception if there is an invalid field in PER_ALERT config when parsing Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (opensearch-project#150) * Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Require condition to not be null when parsing Bucket-Level Trigger Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update InputService for Bucket-Level Alerting (opensearch-project#152) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> * Update TriggerService for Bucket-Level Alerting (opensearch-project#153) * Update TriggerService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Remove client from TriggerService Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update AlertService for Bucket-Level Alerting (opensearch-project#154) * Update AlertService for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Move Alert search size for Bucket-Level Monitors to a const Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add worksheets to help with testing (opensearch-project#151) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update MonitorRunner for Bucket-Level Alerting (opensearch-project#155) * Update MonitorRunner for Bucket-Level Alerting Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Update regressed comment in MonitorRunnerIT Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add TODO to break down runBucketLevelMonitor method in MonitorRunner Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix ktlint formatting issues (opensearch-project#156) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Execute Actions on runTrigger exceptions for Bucket-Level Monitor (opensearch-project#157) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (opensearch-project#158) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return first page of input results in MonitorRunResult for Bucket-Level Monitor (opensearch-project#159) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (opensearch-project#161) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (opensearch-project#163) * Fix bug in paginating multiple bucket paths for Bucket-Level Monitor Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Change trigger after key conditionals to when statement Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (opensearch-project#164) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> * Return only monitors for /monitors/_search. (opensearch-project#162) * Return only monitors for /monitors/_search. * Added missing imports * Added additional check to the unit test * Resolve default for ActionExecutionPolicy at runtime (opensearch-project#165) Signed-off-by: Mohammad Qureshi <qreshi@amazon.com> Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org> Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com> Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com> Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com>
Signed-off-by: Mohammad Qureshi qreshi@amazon.com
Issue #, if available: #86
Description of changes:
Update AlertService implementation with utility methods needed for Bucket-Level Monitor execution
CheckList:
[x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.