Introduce Insights API #1610

jackiehanyang · 2025-11-11T06:12:29Z

Description

Introducing a new Insights API
- POST /_plugins/_anomaly_detection/insights/_start - Start insights job
- GET /_plugins/_anomaly_detection/insights/_status - Get insights job status
- GET /_plugins/_anomaly_detection/insights/_results - Get latest insights results
- POST /_plugins/_anomaly_detection/insights/_stop - Stop insights job
Introducing ml-commons metrics correlation runtime dependency
- sending anomaly results to ml-commons metrics correlation algorithm to analyze
- write analyze results into insights-results index
- frontend will read from this index to display insights on dashboard

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jackie <jkhanjob@gmail.com>

…nsights

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo · 2025-11-13T01:37:43Z

CI failed due to jacoco changes in build.gradle. Not sure how to fix. One naive way is to add correlation request, response, and Action in AD to avoid ml-commons dependency.

* What went wrong:
Execution failed for task ':jacocoTestCoverageVerification'.
> A failure occurred while executing org.gradle.internal.jacoco.JacocoCoverageAction
   > Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: branches covered ratio is 0.35, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.AnomalyDetectorRunner: lines covered ratio is 0.47, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: branches covered ratio is 0.32, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.util.ModelUtil: lines covered ratio is 0.48, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.util.DataUtil: lines covered ratio is 0.72, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: branches covered ratio is 0.55, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.AbstractRetriever: lines covered ratio is 0.63, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: branches covered ratio is 0.28, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.feature.SearchFeatureDao: lines covered ratio is 0.59, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ModelValidationActionHandler: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.ConfigUpdateConfirmer: lines covered ratio is 0.19, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: branches covered ratio is 0.36, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.AggregationPrep: lines covered ratio is 0.40, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: branches covered ratio is 0.06, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation: lines covered ratio is 0.18, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.LatestTimeRetriever: lines covered ratio is 0.00, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: branches covered ratio is 0.37, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.handler.IntervalCalculation.IntervalRecommendationListener: lines covered ratio is 0.55, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: branches covered ratio is 0.45, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.ratelimit.ColdStartWorker: lines covered ratio is 0.73, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.SuggestAnomalyDetectorParamTransportAction: lines covered ratio is 0.11, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADSuggestName: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.transport.ADResultProcessor: lines covered ratio is 0.61, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.model.InitProgressProfile: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.model.IntervalTimeConfiguration: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ml.MLCommonsClient: lines covered ratio is 0.62, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestValidateAction: lines covered ratio is 0.26, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: branches covered ratio is 0.00, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.rest.RestJobAction: lines covered ratio is 0.25, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.rest.AbstractSearchAction: lines covered ratio is 0.60, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse: branches covered ratio is 0.59, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: branches covered ratio is 0.50, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamRequest: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.SuggestConfigParamResponse.Builder: lines covered ratio is 0.57, but expected minimum is 0.75
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: branches covered ratio is 0.54, but expected minimum is 0.60
     Rule violated for class org.opensearch.timeseries.transport.handler.IndexMemoryPressureAwareResultHandler: lines covered ratio is 0.68, but expected minimum is 0.75
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: branches covered ratio is 0.43, but expected minimum is 0.60
     Rule violated for class org.opensearch.ad.ratelimit.ADSaveResultStrategy: lines covered ratio is 0.53, but expected minimum is 0.75

kaituo

partial review

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

src/main/java/org/opensearch/ad/constant/ADCommonName.java

kaituo

partial review

src/main/java/org/opensearch/ad/rest/RestInsightsJobAction.java

kaituo · 2025-11-14T19:51:31Z

src/main/java/org/opensearch/ad/ml/InsightsGenerator.java

+        builder.startObject();
+
+        // Task metadata
+        builder.field("task_id", "task_" + ADCommonName.INSIGHTS_JOB_NAME + "_" + UUID.randomUUID().toString());


Why do you need task id? AD task id is the doc id of state index.

kaituo · 2025-11-14T20:08:22Z

src/main/java/org/opensearch/ad/ml/InsightsGenerator.java

+
+                if (parts.length > 1) {
+                    String seriesKey = parts[1];
+                    seriesKeys.add(seriesKey);


Is the entities set redundant with seriesKeys set?

We don't necessarily need it. Just followed the current practice to have this logical run identifier for the insights generation. Maybe it's useful in the future when integrate with Investigation so we can refer to a specific insights run this this id.

if we need it in the future, we can add it later. Currently we can remove it.

this field is currently used at frontend side

kaituo · 2025-11-14T20:51:31Z

src/main/java/org/opensearch/ad/ml/MLMetricsCorrelationInputBuilder.java

+                // Use MAX score if multiple anomalies in same bucket
+                double currentScore = bucketScores.getOrDefault(bucketIndex, 0.0);
+                double newScore = anomaly.getAnomalyScore();
+                bucketScores.put(bucketIndex, Math.max(currentScore, newScore));


Should we consider interval? Our anomalies are interval anomalies. We can put anomalies scores to all of the buckets interleaving current interval [data start, data end]. If you have already done it, can you point me the code? I cannot find it.

synced offline, no changes needed here

I still don't know where your code is. Can you point me the location?

src/main/resources/mappings/insights-results.json

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo

partial review

src/main/java/org/opensearch/ad/transport/InsightsJobTransportAction.java

kaituo · 2025-11-21T20:25:43Z

src/main/java/org/opensearch/ad/transport/InsightsJobTransportAction.java

+                        .sort("generated_at", SortOrder.DESC)
+                );
+
+            client.search(searchRequest, ActionListener.wrap(searchResponse -> {


Do you need to add backend role filtering before search? Please add security tests with backend role filtering on.

We need to do tenant-isolated search, but not necessarily backend role filtering here. For insights generation, it's a background job, so followed existing pattern to use InjectSecurity directly for background work, just impersonate the stored user via InjectSecurity, then execute search directly. For user-facing search APIs like search anomaly result transport action, AD reads the current user from thread context and then adds backend role filtering.

Adding security tests in the next revision

So we have three kinds of auth now:

fgac role

backend-role filtering

resource sharing.

We will need to cover all three of them.

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

kaituo · 2025-11-21T23:18:08Z

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

+        try {
+            injectSecurity.inject(user, roles);
+
+            localClient


We should verify if mapping is changed by customer before writing. If yes, report error/stop job and stop writing.

good catch, updated in the new revision

src/main/java/org/opensearch/ad/rest/handler/InsightsJobActionHandler.java

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo

partial review

src/main/java/org/opensearch/ad/rest/handler/InsightsJobActionHandler.java

kaituo · 2025-11-24T23:14:32Z

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

+
+        log.info("Running Insights job for time window: {} to {}", executionStartTime, executionEndTime);
+
+        querySystemResultIndex(jobParameter, lockService, lock, executionStartTime, executionEndTime);


Only querying system result index would hinder your ability to go GA alone. You have to tie insights with Auto AD creation. One route to go to GA is to add a text box in AD overview page. That would add a summary on top of existing detectors' results.

kaituo

partial review

kaituo · 2025-11-25T18:29:50Z

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

+                    fetchDetectorMetadataAndProceed(allAnomalies, jobParameter, lockService, lock, executionStartTime, executionEndTime);
+                } else {
+                    log.info("No anomalies found in time window, skipping ML correlation");
+                    releaseLock(jobParameter, lockService, lock);


releaseLock is scattered through InsightsJobProcessor right now is fragile. It’s easy to add a new early-return or error path and forget to release the lock. Also, it is hard for others to maintain.

How about

private void runInsightsJob(Job job, LockService lockService, LockModel lock, Instant start, Instant end) { if (lock == null) { log.warn("Can't run Insights job due to null lock for {}", job.getName()); return; } ActionListener<Void> lockReleasing = guardedLockReleasingListener(job, lockService, lock); // Top-level listener for “anomalies finished” ActionListener<List<AnomalyResult>> anomaliesListener = ActionListener.wrap( anomalies -> { if (anomalies.isEmpty()) { log.info("No anomalies, skipping ML correlation"); lockReleasing.onResponse(null); return; } fetchDetectorMetadataAndProceed(anomalies, job, start, end, lockReleasing); }, lockReleasing::onFailure ); querySystemResultIndex(job, start, end, anomaliesListener); } private ActionListener<Void> guardedLockReleasingListener(Job job, LockService lockService, LockModel lock) { AtomicBoolean done = new AtomicBoolean(false); return ActionListener.wrap( r -> { if (done.compareAndSet(false, true)) { releaseLock(job, lockService, lock); } else { log.warn("Lock already released for Insights job {}", job.getName()); } }, e -> { if (done.compareAndSet(false, true)) { log.error("Insights job failed", e); releaseLock(job, lockService, lock); } else { log.warn("Lock already released for Insights job {} (got extra failure)", job.getName(), e); } } ); }

Now:

querySystemResultIndex(...) only gets ActionListener<List> listener. fetchPagedAnomalies(...) only gets ActionListener<List> listener.

They know nothing about locks, nothing about completion – they just follow the normal “always call your listener” rule.

This would reduce the “special” callsites to:

runInsightsJob (creates lockReleasing).

A few terminal branches in the top-level logic (no anomalies, ML disabled, write succeeded/failed, etc.).

refactored, now it has a single guarded lock releasing listener and lock-free sub processes

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo

Finished one round of review.

kaituo · 2025-11-26T21:32:36Z

src/main/java/org/opensearch/ad/rest/handler/InsightsJobActionHandler.java

+     * @param listener Action listener for the response
+     */
+    public void startInsightsJob(String frequency, ActionListener<InsightsJobResponse> listener) {
+        logger.info("Starting insights job with frequency: {}", frequency);


To be safe, you will need to verify if users have search result permission by using their credentials run search api and check if there is any security exception.

you mean we should check if users have search insights result index permission? _results api is checking it, _start is just about starting the job, no search insights result involved.

kaituo · 2025-11-26T21:37:58Z

src/main/java/org/opensearch/ad/model/ADTaskType.java

    // entity level task to track just one specific entity's state, init progress, error etc.
-    HISTORICAL_HC_ENTITY;
+    HISTORICAL_HC_ENTITY,
+    INSIGHTS;


Where do you use the new task type?

added it while implementing but ended up not using it, removed

src/main/java/org/opensearch/timeseries/rest/handler/IndexJobActionHandler.java

kaituo · 2025-11-26T22:16:27Z

src/test/java/org/opensearch/ad/rest/SecureADRestIT.java

    }

+    public void testInsightsApisUseSystemContextForJobIndex() throws IOException {
+        // Use a non-admin user with AD access (alice) to exercise Insights APIs end-to-end under security


Can you also use non-admin user (e.g., bobUser) to check if things would fail?

added in the new revision

kaituo · 2025-11-26T22:17:16Z

src/test/java/org/opensearch/ad/rest/SecureADRestIT.java

+        Response stopResp = TestHelpers.makeRequest(aliceClient, "POST", stopPath, ImmutableMap.of(), "", null);
+        assertEquals("Stop insights job failed", RestStatus.OK, TestHelpers.restStatus(stopResp));
+    }
+


Can you have alice create detectors, check if results are generated, start insights, check if any insights can be generated after a few minutes; query insights using normal user; then stop insights?

kaituo · 2025-11-26T22:22:01Z

src/main/java/org/opensearch/ad/ml/MLMetricsCorrelationInputBuilder.java

+            if (bucketIndex >= 0) {
+                // Use MAX score if multiple anomalies in same bucket
+                double currentScore = bucketScores.getOrDefault(bucketIndex, 0.0);
+                double newScore = anomaly.getAnomalyScore();


getAnomalyScore() can be null.

when querying anomaly results, I'm only querying those with anomaly_grade > 0, so at here anomalyScore won't be null

kaituo · 2025-11-26T22:28:11Z

src/main/java/org/opensearch/ad/ml/InsightsGenerator.java

+
+                if (parts.length > 1) {
+                    String seriesKey = parts[1];
+                    seriesKeys.add(seriesKey);


if we need it in the future, we can add it later. Currently we can remove it.

kaituo · 2025-11-26T22:28:52Z

src/main/java/org/opensearch/ad/ml/MLMetricsCorrelationInputBuilder.java

+                // Use MAX score if multiple anomalies in same bucket
+                double currentScore = bucketScores.getOrDefault(bucketIndex, 0.0);
+                double newScore = anomaly.getAnomalyScore();
+                bucketScores.put(bucketIndex, Math.max(currentScore, newScore));


I still don't know where your code is. Can you point me the location?

kaituo · 2025-11-26T22:30:42Z

src/main/java/org/opensearch/ad/transport/InsightsJobTransportAction.java

+                        .sort("generated_at", SortOrder.DESC)
+                );
+
+            client.search(searchRequest, ActionListener.wrap(searchResponse -> {


So we have three kinds of auth now:

fgac role

backend-role filtering

resource sharing.

We will need to cover all three of them.

src/main/java/org/opensearch/ad/InsightsJobProcessor.java

kaituo · 2025-11-26T22:34:45Z

Any idea why "Build and Test anomaly detection" CI are skipped (saw it happens yesterday and today)?

jackiehanyang added 3 commits November 6, 2025 10:48

initial commit

64021b5

Signed-off-by: Jackie <jkhanjob@gmail.com>

Merge remote-tracking branch 'origin/main' into tmp/merge-main-into-i…

45e795f

…nsights

Introduce Insights API

b14da81

Signed-off-by: Jackie <jkhanjob@gmail.com>

jackiehanyang requested review from VijayanB, amitgalitz, dbwiddis, jmazanec15, jngz-es, joshpalis, kaituo, ohltyler, owaiskazi19, saratvemulapalli, sean-zheng-amazon, vamshin and ylwu-amzn as code owners November 11, 2025 06:12

opensearch-trigger-bot bot added infra Changes to infrastructure, testing, CI/CD, pipelines, etc. backport 2.x labels Nov 11, 2025

jackiehanyang added 4 commits November 10, 2025 22:28

spotless apply

ed68a6d

Signed-off-by: Jackie <jkhanjob@gmail.com>

add change log

200d461

Signed-off-by: Jackie <jkhanjob@gmail.com>

fix

671e103

Signed-off-by: Jackie <jkhanjob@gmail.com>

make timestamp more user friendly when reading

77fcda9

Signed-off-by: Jackie <jkhanjob@gmail.com>

jackiehanyang force-pushed the insights branch from 4883d42 to 77fcda9 Compare November 11, 2025 20:00

jackiehanyang added 2 commits November 11, 2025 12:11

fix forbidden api violation by avoiding default-locale usage

3d50179

Signed-off-by: Jackie <jkhanjob@gmail.com>

add more tests

49018e0

Signed-off-by: Jackie <jkhanjob@gmail.com>

jackiehanyang force-pushed the insights branch from 0eba385 to 49018e0 Compare November 12, 2025 17:31

kaituo reviewed Nov 13, 2025

View reviewed changes

kaituo reviewed Nov 14, 2025

View reviewed changes

use stashed context when starting insights job

816d69b

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo reviewed Nov 22, 2025

View reviewed changes

add one time insights job run

1356920

Signed-off-by: Jackie <jkhanjob@gmail.com>

jackiehanyang force-pushed the insights branch from 3076f15 to 1356920 Compare November 22, 2025 09:20

do immediate one time run job right after starting insights job

b731aef

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo reviewed Nov 25, 2025

View reviewed changes

kaituo reviewed Nov 26, 2025

View reviewed changes

address comments

1ebe2a8

Signed-off-by: Jackie <jkhanjob@gmail.com>

kaituo reviewed Nov 26, 2025

View reviewed changes


		log.info("Running Insights job for time window: {} to {}", executionStartTime, executionEndTime);

		querySystemResultIndex(jobParameter, lockService, lock, executionStartTime, executionEndTime);

Introduce Insights API #1610

Are you sure you want to change the base?

Introduce Insights API #1610

Uh oh!

Conversation

jackiehanyang commented Nov 11, 2025

Description

Related Issues

Check List

Uh oh!

kaituo commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaituo Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaituo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kaituo commented Nov 13, 2025 •

edited

Loading

kaituo Nov 14, 2025 •

edited

Loading

jackiehanyang Nov 26, 2025 •

edited

Loading