Skip to content

Conversation

@qreshi
Copy link
Contributor

@qreshi qreshi commented Aug 25, 2021

Signed-off-by: Mohammad Qureshi qreshi@amazon.com

Issue #, if available: #86

Description of changes:
Update AlertService implementation with utility methods needed for Bucket-Level Monitor execution

CheckList:
[x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>
@qreshi qreshi requested a review from rishabhmaurya August 25, 2021 21:53
suspend fun loadCurrentAlertsForBucketLevelMonitor(monitor: Monitor): Map<Trigger, MutableMap<String, Alert>> {
val searchAlertsResponse: SearchResponse = searchAlerts(
monitorId = monitor.id,
size = 500 // TODO: This should be a constant and limited based on the circuit breaker that limits Alerts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this constant defined here looks ugly, move somewhere else if it makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 500 size could be too big and consume big chunk of heap if individual bucket size is big. Will loading alerts in paginated way help here?
We have a requirement in MonitorRunner to load all alerts and then dedupe any existing ones. Can this process of dedupe/categorization be done in a paginated way? If we load all current alerts in an ordered way and also, paginate and keep discarding old pages to categorize the new alerts, wouldn't that help in limiting the memory here?
We don't have to fix it now, but something we can think about and probably create an issue for future optimization if you think its feasible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the reason this was a harcoded value and not defined as a constant was because it was arbitrary for the time being. We should discuss with @elfisher and see where we want to draw the cutoff.

The problem is that this size is limiting the Alerts that are being fetched at the Monitor level whereas any circuit breakers we add might make more sense at the Trigger level (since that's where we de-dupe). Imposing a circuit breaker in general to the max Alert count will require some refactoring in the MonitorRunner logic since currently it is indexing new/de-duped Alerts as it paginates composite agg results (so we wouldn't be able to circuit break halfway without having partial results unless we removed that behavior).

For the time being, I'll move this value of 500 to a constant variable and once all changes are in main, at the very least I'll add a warn log that this value has been exceeded at the Monitor level. If time permits, let's discuss a circuit breaker option (and if we are willing to fail the Monitor here). Either way, we will be creating an issue for this if it is not part of the initial release.

.map { ActionExecutionResult(it.key, it.value.executionTime, if (it.value.throttled) 1 else 0) }
)

val updatedErrorHistory = currentAlert.errorHistory.update(alertError)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we always load full error history? can this be optimized as it can be occupy a big chunk of heap. Another possible optimization which can be added later if you think its a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like errorHistory is capped to 10 elements but I agree, we can revisit some of these operations to think through optimizations.

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>
@qreshi qreshi merged commit 2351783 into opensearch-project:main Aug 26, 2021
@qreshi qreshi deleted the update-alert-service branch August 26, 2021 20:06
adityaj1107 added a commit that referenced this pull request Sep 2, 2021
* Added release notes for OpenSearch 1.0.0.0. (#123) (#124)

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>

* Add Integtest.sh for OpenSearch integtest setups (#121)

* Add integtest script to the repo

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Add Alerting specific security param for integTest

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Remove default assignee (#127)

Signed-off-by: Ashish Agrawal <ashisagr@amazon.com>

* Removing All Usages of Action Get Method Calls and adding the listeners (#130)

Signed-off-by: Aditya Jindal <aditjind@amazon.com>

* Fix snapshot build and increment to 1.1.0. (#142)

Signed-off-by: dblock <dblock@amazon.com>

* Refactor MonitorRunner (#143)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update Bucket-Level Alerting RFC (#145)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add BucketSelector pipeline aggregation extension (#144)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add AggregationResultBucket (#148)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add ActionExecutionPolicy (#149)

* Add ActionExecutionPolicy

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Throw exception if there is an invalid field in PER_ALERT config when parsing

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (#150)

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Require condition to not be null when parsing Bucket-Level Trigger

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update InputService for Bucket-Level Alerting (#152)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Update TriggerService for Bucket-Level Alerting (#153)

* Update TriggerService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Remove client from TriggerService

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update AlertService for Bucket-Level Alerting (#154)

* Update AlertService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Move Alert search size for Bucket-Level Monitors to a const

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add worksheets to help with testing (#151)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update MonitorRunner for Bucket-Level Alerting (#155)

* Update MonitorRunner for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update regressed comment in MonitorRunnerIT

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add TODO to break down runBucketLevelMonitor method in MonitorRunner

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix ktlint formatting issues (#156)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Execute Actions on runTrigger exceptions for Bucket-Level Monitor (#157)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (#158)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return first page of input results in MonitorRunResult for Bucket-Level Monitor (#159)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (#161)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (#163)

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Change trigger after key conditionals to when statement

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (#164)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return only monitors for /monitors/_search. (#162)

* Return only monitors for /monitors/_search.

* Added missing imports

* Added additional check to the unit test

* Resolve default for ActionExecutionPolicy at runtime (#165)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Ashish Agrawal <ashisagr@amazon.com>
Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org>
Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com>
Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>
@qreshi qreshi added the feature A change that introduces a new unit of functionality label Sep 8, 2021
AWSHurneyt added a commit that referenced this pull request Oct 15, 2021
…ledging more than 10 alerts at once. (#205)

* Added release notes for OpenSearch 1.0.0.0. (#123)

* Merge commits from the main branch to the 1.x branch.  (#133)

* Added release notes for OpenSearch 1.0.0.0. (#123) (#124)

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>

* Add Integtest.sh for OpenSearch integtest setups (#121)

* Add integtest script to the repo

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Add Alerting specific security param for integTest

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Remove default assignee (#127)

Signed-off-by: Ashish Agrawal <ashisagr@amazon.com>

* Removing All Usages of Action Get Method Calls and adding the listeners (#130)

Signed-off-by: Aditya Jindal <aditjind@amazon.com>

* Fix snapshot build and increment to 1.1.0. (#142)

Signed-off-by: dblock <dblock@amazon.com>

* Refactor MonitorRunner (#143)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update Bucket-Level Alerting RFC (#145)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add BucketSelector pipeline aggregation extension (#144)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add AggregationResultBucket (#148)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add ActionExecutionPolicy (#149)

* Add ActionExecutionPolicy

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Throw exception if there is an invalid field in PER_ALERT config when parsing

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (#150)

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Require condition to not be null when parsing Bucket-Level Trigger

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update InputService for Bucket-Level Alerting (#152)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Update TriggerService for Bucket-Level Alerting (#153)

* Update TriggerService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Remove client from TriggerService

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update AlertService for Bucket-Level Alerting (#154)

* Update AlertService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Move Alert search size for Bucket-Level Monitors to a const

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add worksheets to help with testing (#151)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update MonitorRunner for Bucket-Level Alerting (#155)

* Update MonitorRunner for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update regressed comment in MonitorRunnerIT

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add TODO to break down runBucketLevelMonitor method in MonitorRunner

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix ktlint formatting issues (#156)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Execute Actions on runTrigger exceptions for Bucket-Level Monitor (#157)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (#158)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return first page of input results in MonitorRunResult for Bucket-Level Monitor (#159)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (#161)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (#163)

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Change trigger after key conditionals to when statement

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (#164)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return only monitors for /monitors/_search. (#162)

* Return only monitors for /monitors/_search.

* Added missing imports

* Added additional check to the unit test

* Resolve default for ActionExecutionPolicy at runtime (#165)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Ashish Agrawal <ashisagr@amazon.com>
Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org>
Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com>
Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>

* Add release notes for 1.1.0.0 release (#166) (#167)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Remove default integtest.sh. (#181)

Signed-off-by: dblock <dblock@dblock.org>

* Add valid search filters. (#191)

* Add valid search filters.

* Added this fix to release notes

* Publish notification JARs checksums. (#197)

Signed-off-by: dblock <dblock@dblock.org>

* Also publish SHA 256 and 512 checksums. (#198)

* Also publish SHA 256 and 512 checksums.

Signed-off-by: dblock <dblock@dblock.org>

* Remove sonatype staging.

Signed-off-by: dblock <dblock@dblock.org>

* Fixed a bug that was preventing the AcknowledgeAlerts API from acknowledging more than 10 alerts at once. Signed-off-by: Thomas Hurney <hurneyt@amazon.com>

* Implemented integration tests to ensure fix for issue 203 is working as expected. Signed-off-by: Thomas Hurney <hurneyt@amazon.com>

* Refactored integ tests based on PR feedback, and listed the bug fix in the release notes. Signed-off-by: Thomas Hurney <hurneyt@amazon.com>

* Removing bug fixes from release notes. Currently discussing adding separate notes for this patch. Signed-off-by: Thomas Hurney <hurneyt@amazon.com>

Co-authored-by: Aditya Jindal <13850971+aditjind@users.noreply.github.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Ashish Agrawal <ashisagr@amazon.com>
Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org>
Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com>
Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>
AWSHurneyt added a commit to AWSHurneyt/OpenSearch-Alerting that referenced this pull request Mar 30, 2022
…ject#133)

* Added release notes for OpenSearch 1.0.0.0. (opensearch-project#123) (opensearch-project#124)

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>

* Add Integtest.sh for OpenSearch integtest setups (opensearch-project#121)

* Add integtest script to the repo

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Add Alerting specific security param for integTest

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Remove default assignee (opensearch-project#127)

Signed-off-by: Ashish Agrawal <ashisagr@amazon.com>

* Removing All Usages of Action Get Method Calls and adding the listeners (opensearch-project#130)

Signed-off-by: Aditya Jindal <aditjind@amazon.com>

* Fix snapshot build and increment to 1.1.0. (opensearch-project#142)

Signed-off-by: dblock <dblock@amazon.com>

* Refactor MonitorRunner (opensearch-project#143)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update Bucket-Level Alerting RFC (opensearch-project#145)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add BucketSelector pipeline aggregation extension (opensearch-project#144)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add AggregationResultBucket (opensearch-project#148)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add ActionExecutionPolicy (opensearch-project#149)

* Add ActionExecutionPolicy

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Throw exception if there is an invalid field in PER_ALERT config when parsing

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Don't allow throttle to be configured for PerExecutionActionScope at the data class level since it is not supported yet

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Lev… (opensearch-project#150)

* Refactor Monitor and Trigger to split into Query-Level and Bucket-Level Monitors

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Require condition to not be null when parsing Bucket-Level Trigger

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update InputService for Bucket-Level Alerting (opensearch-project#152)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Update TriggerService for Bucket-Level Alerting (opensearch-project#153)

* Update TriggerService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Remove client from TriggerService

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update AlertService for Bucket-Level Alerting (opensearch-project#154)

* Update AlertService for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Move Alert search size for Bucket-Level Monitors to a const

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add worksheets to help with testing (opensearch-project#151)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update MonitorRunner for Bucket-Level Alerting (opensearch-project#155)

* Update MonitorRunner for Bucket-Level Alerting

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Update regressed comment in MonitorRunnerIT

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add TODO to break down runBucketLevelMonitor method in MonitorRunner

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix ktlint formatting issues (opensearch-project#156)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Execute Actions on runTrigger exceptions for Bucket-Level Monitor (opensearch-project#157)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Skip execution of Actions on ACKNOWLEDGED Alerts for Bucket-Level Monitors (opensearch-project#158)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return first page of input results in MonitorRunResult for Bucket-Level Monitor (opensearch-project#159)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Add setting to limit per alert action executions and don't save Alerts for test Bucket-Level Monitors (opensearch-project#161)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor (opensearch-project#163)

* Fix bug in paginating multiple bucket paths for Bucket-Level Monitor

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Change trigger after key conditionals to when statement

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Various bug fixes pertaining to throttling on PER_ALERT, saving COMPLETED Alerts and rewriting input query for Bucket-Level Monitors (opensearch-project#164)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

* Return only monitors for /monitors/_search. (opensearch-project#162)

* Return only monitors for /monitors/_search.

* Added missing imports

* Added additional check to the unit test

* Resolve default for ActionExecutionPolicy at runtime (opensearch-project#165)

Signed-off-by: Mohammad Qureshi <qreshi@amazon.com>

Co-authored-by: AWSHurneyt <79280347+AWSHurneyt@users.noreply.github.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Ashish Agrawal <ashisagr@amazon.com>
Co-authored-by: Daniel Doubrovkine (dB.) <dblock@dblock.org>
Co-authored-by: Mohammad Qureshi <47198598+qreshi@users.noreply.github.com>
Co-authored-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Co-authored-by: Sriram <59816283+skkosuri-amzn@users.noreply.github.com>
Signed-off-by: AWSHurneyt <hurneyt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature A change that introduces a new unit of functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants