[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573

ymao1 · 2024-09-11T12:14:29Z

Summary

This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests.

To Verify

Add some logging to x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts to indicate when windows are being fetched and when they're returning from the cache.
Create and run some rules in different spaces with and without alerts to see that the maintenance windows are only loaded when there are alerts and that the windows are returned from the cache when the cache has not expired.

ymao1 · 2024-09-16T14:58:36Z

x-pack/plugins/alerting/server/plugin.ts

      getRulesClientWithRequest,
      kibanaBaseUrl: this.kibanaBaseUrl,
      logger,
+      maintenanceWindowsService: new MaintenanceWindowsService({
+        cacheInterval: this.config.rulesSettings.cacheInterval,


this is used just for reducing the cache interval for testing so I reused the setting from the rules settings service.

ymao1 · 2024-09-16T20:41:18Z

x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts

+    });
+
+    // Only look at maintenance windows for this rule category
+    const maintenanceWindows = activeMaintenanceWindows.filter(({ categoryIds }) => {


active maintenance windows are cached but whether those active maintenance windows apply to this rule type based on category are checked every time this function is called.

ymao1 · 2024-09-16T20:42:52Z

x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/event_log.ts

+          const executeEvents = events.filter((event) => event?.event?.action === 'execute');
+
+          // the first execute event should not have any maintenance window ids because there were no alerts during the
+          // first execution


this is a change from before where maintenance windows were loaded and set in the event log even if there were no alerts during the rule execution, so I believe this is more correct than before.

…aintenance-windows-later

elasticmachine · 2024-09-17T00:25:29Z

Pinging @elastic/response-ops (Team:ResponseOps)

elasticmachine · 2024-09-19T11:52:33Z

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

ymao1 · 2024-09-19T15:30:18Z

@elasticmachine merge upstream

…aintenance-windows-later

ymao1 · 2024-09-23T12:02:42Z

@elasticmachine merge upstream

pmuellr

code LGTM

Noted what is either a bug or me being confused in getActiveMaintenanceWindows(), which seems like it would never return any windows since it's searching on a unique date generated in that method.

pmuellr · 2024-09-19T20:01:11Z

x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts

+    }
+  }
+
+  public async loadMaintenanceWindows(


I debugged the code to make sure we weren't somehow going down here in the maintance window UX itself, because then we'd have a (potential) problem - the cached one could be stale.

It doesn't.

However, perhaps we should work in "cache" on the method name here, somehow, in case someone uses this and doesn't realize it could be stale. loadCacheableMaintenanceWindows() (yikes!)

Actually, maybe easier to swap names w/ loadMaintenanceWindows() and the private getMaintenaceWindows().

Switched loadMaintenanceWindows and getMaintenanceWindows in 883d909

pmuellr · 2024-09-24T18:08:51Z

x-pack/plugins/alerting/server/alerts_client/alerts_client.ts

@@ -301,45 +310,25 @@ export class AlertsClient<
    return this.legacyAlertsClient.checkLimitUsage();
  }

-  public processAlerts(opts: ProcessAlertsOpts) {
-    this.legacyAlertsClient.processAlerts(opts);
+  public async processAlerts(opts: ProcessAlertsOpts) {


complete aside, but if the occaisonal event loop delays we seen in rules are caused by processAlerts(), making this async may help - chunks things up a bit, give other node events get a chance to run

pmuellr · 2024-09-24T18:42:16Z

...g/server/application/maintenance_window/methods/get_active/get_active_maintenance_windows.ts

+    const startDateWithCacheOffset = new Date(startDate.getTime() + cacheIntervalMs);
+    const startDateWithCacheOffsetISO = startDateWithCacheOffset.toISOString();
+    eventsKuery = nodeBuilder.or([
+      nodeBuilder.is('maintenance-window.attributes.events', startDateISO),


Confused by this. We're generating a new date in this method, and then searching for it explicitly in the SO's? I would think that would literally never work.

I see the change from using date ranges (which makes sense) to the is check was done here: #157112

Can we find out what's going on? Either it's a bug, or needs a comment :-)

I was confused as well but I tested it and it does work. I think because events is mapped as a date_range it just works? cc @JiaweiWu do you know?

oooooh! Yup! Thx, it is a date_range and you can do term queries against them. Either forgot that, or TIL

https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html

pmuellr · 2024-09-24T19:11:26Z

x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/get_alert_summary.ts

@@ -308,6 +308,9 @@ export default function createGetAlertSummaryTests({ getService }: FtrProviderCo
          true
        );

+        // wait so cache expires
+        await setTimeoutAsync(10000);


Feels like we should use a common constant instead of literal. Will be easier to fix later :-)

Added const in 883d909

…aintenance-windows-later

ymao1 · 2024-09-25T15:26:57Z

@elasticmachine run docs-build

dominiqueclarke

obs-ux-management changes LGTM

ymao1 · 2024-09-26T11:28:27Z

@elasticmachine merge upstream

kibana-ci · 2024-09-26T12:23:12Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 3e43f91
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-192573-3e43f919b8c2

Failed CI Steps

Metrics [docs]

✅ unchanged

History

💛 Build #237064 was flaky 883d909
💛 Build #236330 was flaky 5e3d400
💛 Build #236051 was flaky 99556c2
💔 Build #235684 failed fc94013
💛 Build #235577 was flaky 3dd0eac

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

kibanamachine · 2024-09-26T13:05:17Z

💔 All backports failed

Status	Branch	Result
❌	8.x	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 192573

Questions ?

Please refer to the Backport tool documentation

ymao1 · 2024-09-26T16:43:15Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

… alerts during rule execution and caching loaded maintenance windows (elastic#192573) Resolves elastic#184324 ## Summary This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests. ## To Verify 1. Add some logging to x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts to indicate when windows are being fetched and when they're returning from the cache. 2. Create and run some rules in different spaces with and without alerts to see that the maintenance windows are only loaded when there are alerts and that the windows are returned from the cache when the cache has not expired. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 93414a6)

…re are alerts during rule execution and caching loaded maintenance windows (#192573) (#194191) # Backport This will backport the following commits from `main` to `8.x`: - [[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)](#192573)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

Only loading maintenance windows if there are new alerts to persist

a544490

ymao1 force-pushed the alerting/load-maintenance-windows-later branch from 3c4683b to a544490 Compare September 12, 2024 18:57

ymao1 added 3 commits September 12, 2024 15:53

Fixing types

6f71723

Need to load when there are any alerts not just new

71b7538

Merging in main

e097e99

ymao1 changed the title ~~Moving maintenance window calculations out of process alerts~~ [Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution. Sep 13, 2024

ymao1 added 3 commits September 13, 2024 13:23

Need to load when there are any alerts not just new

8d82191

Merging in main

3ef5f22

wip

9a5f463

ymao1 commented Sep 16, 2024

View reviewed changes

Caching maintenance windows

9293e37

ymao1 changed the title ~~[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution.~~ [Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows Sep 16, 2024

Fixing tests

c5af98f

ymao1 self-assigned this Sep 16, 2024

ymao1 commented Sep 16, 2024

View reviewed changes

ymao1 added 2 commits September 16, 2024 17:11

Fixing types

cbb7c82

Merge branch 'main' of github.com:elastic/kibana into alerting/load-m…

18f8e10

…aintenance-windows-later

ymao1 marked this pull request as ready for review September 17, 2024 00:25

ymao1 requested review from a team as code owners September 17, 2024 00:25

ymao1 requested a review from rylnd September 17, 2024 00:25

botelastic bot added ci:project-deploy-observability Create an Observability project Team:obs-ux-management Observability Management User Experience Team labels Sep 19, 2024

[CI] Auto-commit changed files from 'yarn openapi:bundle'

3dd0eac

elastic-vault-github-plugin-prod bot requested a review from a team as a code owner September 19, 2024 12:38

Merge branch 'main' into alerting/load-maintenance-windows-later

fc94013

PhilippeOberti removed the request for review from a team September 19, 2024 21:42

ymao1 added 2 commits September 20, 2024 12:01

Fixing test

46e48cf

Merge branch 'main' of github.com:elastic/kibana into alerting/load-m…

99556c2

…aintenance-windows-later

Merge branch 'main' into alerting/load-maintenance-windows-later

5e3d400

pmuellr approved these changes Sep 24, 2024

View reviewed changes

ymao1 added 2 commits September 25, 2024 10:33

Merge branch 'main' of github.com:elastic/kibana into alerting/load-m…

120df74

…aintenance-windows-later

PR feedback

883d909

dominiqueclarke approved these changes Sep 26, 2024

View reviewed changes

Merge branch 'main' into alerting/load-maintenance-windows-later

3e43f91

ymao1 merged commit 93414a6 into elastic:main Sep 26, 2024
46 checks passed

ymao1 deleted the alerting/load-maintenance-windows-later branch September 26, 2024 12:59

ymao1 mentioned this pull request Sep 26, 2024

[8.x] [Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573) #194191

Merged

kibanamachine mentioned this pull request Sep 26, 2024

[Dashboard] Cleanup services #193644

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573

[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573

ymao1 commented Sep 11, 2024 •

edited by kibanamachine

Loading

ymao1 Sep 16, 2024

ymao1 Sep 16, 2024

ymao1 Sep 16, 2024

elasticmachine commented Sep 17, 2024

elasticmachine commented Sep 19, 2024

ymao1 commented Sep 19, 2024

ymao1 commented Sep 23, 2024

pmuellr left a comment

pmuellr Sep 19, 2024

ymao1 Sep 25, 2024

pmuellr Sep 24, 2024

pmuellr Sep 24, 2024

ymao1 Sep 24, 2024

pmuellr Sep 24, 2024

pmuellr Sep 24, 2024

ymao1 Sep 25, 2024

ymao1 commented Sep 25, 2024

dominiqueclarke left a comment

ymao1 commented Sep 26, 2024

kibana-ci commented Sep 26, 2024 •

edited

Loading

kibanamachine commented Sep 26, 2024

ymao1 commented Sep 26, 2024

[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573

[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows #192573

Conversation

ymao1 commented Sep 11, 2024 • edited by kibanamachine Loading

Summary

To Verify

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Sep 17, 2024

elasticmachine commented Sep 19, 2024

ymao1 commented Sep 19, 2024

ymao1 commented Sep 23, 2024

pmuellr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ymao1 commented Sep 25, 2024

dominiqueclarke left a comment

Choose a reason for hiding this comment

ymao1 commented Sep 26, 2024

kibana-ci commented Sep 26, 2024 • edited Loading

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

History

kibanamachine commented Sep 26, 2024

💔 All backports failed

Manual backport

Questions ?

ymao1 commented Sep 26, 2024

💚 All backports created successfully

Questions ?

ymao1 commented Sep 11, 2024 •

edited by kibanamachine

Loading

kibana-ci commented Sep 26, 2024 •

edited

Loading