-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573
[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573
Conversation
3c4683b
to
a544490
Compare
getRulesClientWithRequest, | ||
kibanaBaseUrl: this.kibanaBaseUrl, | ||
logger, | ||
maintenanceWindowsService: new MaintenanceWindowsService({ | ||
cacheInterval: this.config.rulesSettings.cacheInterval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used just for reducing the cache interval for testing so I reused the setting from the rules settings service.
}); | ||
|
||
// Only look at maintenance windows for this rule category | ||
const maintenanceWindows = activeMaintenanceWindows.filter(({ categoryIds }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
active maintenance windows are cached but whether those active maintenance windows apply to this rule type based on category are checked every time this function is called.
const executeEvents = events.filter((event) => event?.event?.action === 'execute'); | ||
|
||
// the first execute event should not have any maintenance window ids because there were no alerts during the | ||
// first execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a change from before where maintenance windows were loaded and set in the event log even if there were no alerts during the rule execution, so I believe this is more correct than before.
…aintenance-windows-later
Pinging @elastic/response-ops (Team:ResponseOps) |
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
@elasticmachine merge upstream |
…aintenance-windows-later
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM
Noted what is either a bug or me being confused in getActiveMaintenanceWindows()
, which seems like it would never return any windows since it's searching on a unique date generated in that method.
} | ||
} | ||
|
||
public async loadMaintenanceWindows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debugged the code to make sure we weren't somehow going down here in the maintance window UX itself, because then we'd have a (potential) problem - the cached one could be stale.
It doesn't.
However, perhaps we should work in "cache" on the method name here, somehow, in case someone uses this and doesn't realize it could be stale. loadCacheableMaintenanceWindows()
(yikes!)
Actually, maybe easier to swap names w/ loadMaintenanceWindows()
and the private getMaintenaceWindows()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched loadMaintenanceWindows
and getMaintenanceWindows
in 883d909
@@ -301,45 +310,25 @@ export class AlertsClient< | |||
return this.legacyAlertsClient.checkLimitUsage(); | |||
} | |||
|
|||
public processAlerts(opts: ProcessAlertsOpts) { | |||
this.legacyAlertsClient.processAlerts(opts); | |||
public async processAlerts(opts: ProcessAlertsOpts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complete aside, but if the occaisonal event loop delays we seen in rules are caused by processAlerts()
, making this async may help - chunks things up a bit, give other node events get a chance to run
const startDateWithCacheOffset = new Date(startDate.getTime() + cacheIntervalMs); | ||
const startDateWithCacheOffsetISO = startDateWithCacheOffset.toISOString(); | ||
eventsKuery = nodeBuilder.or([ | ||
nodeBuilder.is('maintenance-window.attributes.events', startDateISO), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused by this. We're generating a new date in this method, and then searching for it explicitly in the SO's? I would think that would literally never work.
I see the change from using date ranges (which makes sense) to the is
check was done here: #157112
Can we find out what's going on? Either it's a bug, or needs a comment :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused as well but I tested it and it does work. I think because events
is mapped as a date_range
it just works? cc @JiaweiWu do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooooh! Yup! Thx, it is a date_range
and you can do term queries against them. Either forgot that, or TIL
https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html
@@ -308,6 +308,9 @@ export default function createGetAlertSummaryTests({ getService }: FtrProviderCo | |||
true | |||
); | |||
|
|||
// wait so cache expires | |||
await setTimeoutAsync(10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should use a common constant instead of literal. Will be easier to fix later :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added const in 883d909
…aintenance-windows-later
@elasticmachine run docs-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obs-ux-management changes LGTM
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ymao1 |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… alerts during rule execution and caching loaded maintenance windows (elastic#192573) Resolves elastic#184324 ## Summary This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests. ## To Verify 1. Add some logging to x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts to indicate when windows are being fetched and when they're returning from the cache. 2. Create and run some rules in different spaces with and without alerts to see that the maintenance windows are only loaded when there are alerts and that the windows are returned from the cache when the cache has not expired. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 93414a6)
…re are alerts during rule execution and caching loaded maintenance windows (#192573) (#194191) # Backport This will backport the following commits from `main` to `8.x`: - [[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)](#192573) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ying Mao","email":"ying.mao@elastic.co"},"sourceCommit":{"committedDate":"2024-09-26T12:59:36Z","message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.16.0"],"number":192573,"url":"https://github.com/elastic/kibana/pull/192573","mergeCommit":{"message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192573","number":192573,"mergeCommit":{"message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727"}},{"branch":"8.x","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
Resolves #184324
Summary
This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests.
To Verify