Introduce state machine, separate config index, improve suggest/valid… #1479

kaituo · 2025-05-29T23:15:29Z

…ate APIs, and persist cold-start results for run-once visualization

Description

Forecast State machine Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD
Forecasting has more finer grained state transition than AD.
In AD, each forecasting task runner now moves through
INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states:

Inactive - a forecast that hasn’t been started yet
Inactive: stopped - a forecast stopped by user after running
Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing
Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data
Initializing test - a forecast is building model to run test
Initializing forecast - a forecast is building model to start running continuously
Test complete - a forecast generated a test result and stopped
Running - a forecast running continuously
Initializing test failed
Initializing forecast failed
Forecast failed

See attached graph for the state machine transition graph.

Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java,

Dedicated forecasting config index Splits detector and forecaster configuration indices to comply resource sharing security feature (Introduces resource permissions for detectors #1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java

Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java

Cold-start result persistence Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the
UI without waiting for post-cold-start data.

Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java,

Optimized Cold-Start Processing Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java.
Optional In-Memory Config Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java
Flatten Forecast Result Index Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java
Forecast Run-Once Profile Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java,
Run-Once Fault Tolerance Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java
Centralized RCF Result Conversion Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java,
Refactor getADTask Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java
Conditional Forecast Result Storage Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java
Differentiated Shingle Sizes Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java
Enable Forecasting by Default Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java
Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java
Improve Suggeste history Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java,
Suggest Window Delay Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java
Unified Task Status Updates We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java
Enhanced Debugging - PriorityCach Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java
Enhanced Debugging - PriorityTracker Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java
State Triaging Exception Messages Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java
Improved Interval Recommendation:

Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java
the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java
instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java

Door Keeper Exception for Run-Once Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java
Refactor for code reuse Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java
Fix result index mapping increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json

Testing done:

manual tests.
added new tests.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2025-05-29T23:48:25Z

Codecov Report

Attention: Patch coverage is 74.09910% with 230 lines in your changes missing coverage. Please review.

Project coverage is 81.40%. Comparing base (ee8e38d) to head (5b81557).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...cast/transport/ForecastRunOnceTransportAction.java	28.57%	37 Missing and 3 partials ⚠️
...ansport/BaseSuggestConfigParamTransportAction.java	40.81%	28 Missing and 1 partial ⚠️
...pensearch/timeseries/feature/SearchFeatureDao.java	78.94%	9 Missing and 7 partials ⚠️
.../opensearch/forecast/task/ForecastTaskManager.java	0.00%	14 Missing ⚠️
...java/org/opensearch/timeseries/util/ModelUtil.java	72.72%	7 Missing and 5 partials ⚠️
...h/timeseries/rest/handler/LatestTimeRetriever.java	70.58%	7 Missing and 3 partials ⚠️
...rc/main/java/org/opensearch/ad/ml/ADColdStart.java	73.52%	8 Missing and 1 partial ⚠️
.../org/opensearch/forecast/ml/ForecastColdStart.java	73.52%	8 Missing and 1 partial ⚠️
...va/org/opensearch/timeseries/task/TaskManager.java	76.31%	6 Missing and 3 partials ⚠️
...ansport/ForecastRunOnceProfileTransportAction.java	38.46%	2 Missing and 6 partials ⚠️
... and 29 more

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1479      +/-   ##
============================================
- Coverage     81.56%   81.40%   -0.17%     
- Complexity     5890     5973      +83     
============================================
  Files           535      536       +1     
  Lines         23760    24204     +444     
  Branches       2375     2443      +68     
============================================
+ Hits          19380    19703     +323     
- Misses         3223     3295      +72     
- Partials       1157     1206      +49

Flag	Coverage Δ
plugin	`81.40% <74.09%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../java/org/opensearch/ad/ADEntityProfileRunner.java	`100.00% <ø> (ø)`
...rg/opensearch/ad/AnomalyDetectorProfileRunner.java	`100.00% <ø> (ø)`
...opensearch/ad/ExecuteADResultResponseRecorder.java	`84.21% <100.00%> (ø)`
.../java/org/opensearch/ad/constant/ADCommonName.java	`0.00% <ø> (ø)`
...c/main/java/org/opensearch/ad/indices/ADIndex.java	`100.00% <100.00%> (ø)`
...a/org/opensearch/ad/indices/ADIndexManagement.java	`84.61% <ø> (ø)`
...main/java/org/opensearch/ad/ml/ADModelManager.java	`80.67% <100.00%> (+1.42%)`	⬆️
...ava/org/opensearch/ad/ml/ADRealTimeInferencer.java	`100.00% <ø> (ø)`
.../java/org/opensearch/ad/model/AnomalyDetector.java	`89.87% <ø> (+1.02%)`	⬆️
...pensearch/ad/ratelimit/ADCheckpointReadWorker.java	`100.00% <ø> (ø)`
... and 123 more

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

amitgalitz · 2025-06-06T00:00:33Z

src/main/java/org/opensearch/timeseries/model/TaskState.java

        return description;
    }

    public static List<String> NOT_ENDED_STATES = ImmutableList


Should we add NOT_ENDED_STATES and AWAITING_DATA_TO_RESTART to not_ended. Cause I see we use isDone() in a few places based on this list, should we include these two in not done basically?

amitgalitz · 2025-06-06T14:21:28Z

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java

+                || checkpointReadWorker.hasInflightRequest(configId)
+                || coldStartWorker.hasConfigIdInQueue(configId)
+                || checkpointReadWorker.hasConfigIdInQueue(configId),
+            exception.isEmpty() ? null : ExceptionUtil.getErrorMessage(exception.get())


should we log error stack trace in this case? I see in getErrorMessage we only return full stack on last else branch

This error is gonna shown on the frontend. I tried to not to show a complete stack trace when possible. I will add a log message of complete stack trace for debugging.

amitgalitz · 2025-06-06T14:38:47Z

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

+                            String state = r.get().getState();
+                            // If there is no state, update it; otherwise, it might have been set elsewhere (e.g., by ColdStartWorker)
+                            if (Strings.isEmpty(state)) {
+                                updateTask(forecastID, taskId, TaskState.INIT_TEST_FAILED, exceptionMsg);


should we log exception message for debugging incase updateTask fails?

should have logged in src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java when addressing another comment.

amitgalitz · 2025-06-06T14:48:36Z

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

                    .onFailure(
                        new OpenSearchStatusException(
-                            "cannot start a new test " + forecastID + " since current test hasn't finished.",
+                            "cannot start a new test for " + forecastID + " since current test hasn't finished.",


can't make a comment on specific line but we should return after line 188 where we do this"
listener.onFailure(new OpenSearchStatusException(ForecastCommonMessages.DISABLED_ERR_MSG, FORBIDDEN));

good catch. fixed.

amitgalitz · 2025-06-06T14:53:54Z

src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

+            ProfileUtil
+                .confirmRealtimeResultStatus(
+                    configOptional.get(),
+                    clock.millis() - 2 * configIntervalInMinutes * 60000,


nit: can do a null check on configIntervalInMinutes

amitgalitz · 2025-06-06T14:58:28Z

src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

                        )
                );
-            updateRealtimeTask(response, configId);
+            updateRealtimeTask(response, configId, clock);


nit: can update Instant.now() to clock usage in line 121?

Good catch. Changed.

amitgalitz · 2025-06-06T15:01:02Z

src/main/java/org/opensearch/forecast/model/Forecaster.java

+    @Override
+    public boolean invalidShingleSizeRange(Integer shingleSizeToTest) {
+        return shingleSizeToTest != null
+            && (shingleSizeToTest < ForecastSettings.MINIMUM_SHINLE_SIZE || shingleSizeToTest > TimeSeriesSettings.MAX_SHINGLE_SIZE);


why are there different shingle sizes for ad vs forecasting? also nit MINIMUM_SHINLE_SIZE -> MINIMUM_SHINGLE_SIZE

since we only allow 1 feature in forecasting, if we allow shingle size 1, we get no context for every point. Shingle is our context. Forecasting would be inaccurate without context.

Changed to MINIMUM_SHINGLE_SIZE.

amitgalitz · 2025-06-06T15:03:42Z

src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java

    }
+
+    @Override
+    protected String triageState(Boolean hasResult, String error, Long rcfTotalUpdates) {


nit: should these return TaskState instead of string to not have to relay on enum casting?

We have some places using string and some places using enum. We have to do casting in either option.

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java

amitgalitz · 2025-06-06T15:20:08Z

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java

-     */
-    public ForecastRunOnceProfileRequest(String configId, DiscoveryNode... nodes) {
-        super(nodes);
+    /*Important to have this constructor. Otherwise, OS silently ignore the broadcast request.*/


confused here, can you expand on this at least in response, doesn't have to be a new comment

I meant without this constructor, broadcast call will be dropped by opensearch silently.

amitgalitz · 2025-06-06T15:46:24Z

src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java

+                // user is the one who triggered the caller of this function
+                user,
+                client,
+                AnalysisType.AD,


is this method just used by AD?

Good catch. Changed to

clientUtil .<SearchRequest, SearchResponse>asyncRequestWithInjectedSecurity( request, client::search, // user is the one who triggered the caller of this function user, client, config instanceof AnomalyDetector ? AnalysisType.AD : AnalysisType.FORECAST, searchResponseListener );

I think this occurs in a few methods across searchfeaturedao so good to double check those

double checked other places. Seems fine.

amitgalitz · 2025-06-06T15:50:19Z

src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java

+     *
+     * @param response            the {@link SearchResponse} returned by OpenSearch
+     * @param config              configuration
+     * @param includesEmptyBucket if {@code true}, a bucket with {@code doc_count == 0}


I think because we changed to mindoccount to 0 in other part of code this is always true? Additionally we don't seem to actually utilize this variable anywhere in method

yes, removed

amitgalitz · 2025-06-06T15:58:24Z

src/main/java/org/opensearch/timeseries/NodeStateManager.java


-                NodeState state = states.computeIfAbsent(configID, configId -> new NodeState(configId, clock));
-                state.setConfigDef(config);
+                if (cache) {


at this point didn't we already make the request to get the config, so fetching from cache should happen earlier?

cache here means whether we should put the get config response to cache. Fetching from cache happens in the caller of this method.

amitgalitz · 2025-06-06T16:07:03Z

src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java

+                        String taskId = coldStartRequest.getTaskId();
+                        if (taskId != null) {
+                            Map<String, Object> updatedFields = new HashMap<>();
+                            updatedFields.put(TimeSeriesTask.STATE_FIELD, TaskState.INACTIVE.name());


why is it always being set to inactive here, a little confused on the run once scenario here

This branch is for the cold start failure where we don't have enough data. Since it is run once, we won't retry as we did in real time.

I see, could either INACTIVE or INIT_TEST_FAILED work here?

According to the state transition graph from the overview section, we need to use INACTIVE.

…ate APIs, and persist cold-start results for run-once visualization 1. Forecast State machine Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD Forecasting has more finer grained state transition than AD. In AD, each forecasting task runner now moves through INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states: * Inactive - a forecast that hasn’t been started yet * Inactive: stopped - a forecast stopped by user after running * Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing * Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data * Initializing test - a forecast is building model to run test * Initializing forecast - a forecast is building model to start running continuously * Test complete - a forecast generated a test result and stopped * Running - a forecast running continuously * Initializing test failed * Initializing forecast failed * Forecast failed See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java, 2. Dedicated forecasting config index Splits detector and forecaster configuration indices to comply resource sharing security feature (opensearch-project#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java 3. Cold-start result persistence Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the UI without waiting for post-cold-start data. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java, 4. Optimized Cold-Start Processing Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java. 5. Optional In-Memory Config Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java 6. Flatten Forecast Result Index Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java 7. Forecast Run-Once Profile Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java, 8. Run-Once Fault Tolerance Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 9. Centralized RCF Result Conversion Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java, 10. Refactor getADTask Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java 11. Conditional Forecast Result Storage Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java 12. Differentiated Shingle Sizes Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java 13. Enable Forecasting by Default Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java 14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 15. Improve Suggeste history Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java, 16. Suggest Window Delay Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java 17. Unified Task Status Updates We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java 18. Enhanced Debugging - PriorityCach Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java 19. Enhanced Debugging - PriorityTracker Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java 20. State Triaging Exception Messages Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java 21. Improved Interval Recommendation: * Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java * the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java * instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java 22. Door Keeper Exception for Run-Once Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java 21. Refactor for code reuse Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java 22. Fix result index mapping increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json Testing done: 1. manual tests. 2. added new tests. Signed-off-by: Kaituo Li <kaituo@amazon.com>

Signed-off-by: Kaituo Li <kaituo@amazon.com>

opensearch-project#1479) * Introduce state machine, separate config index, improve suggest/validate APIs, and persist cold-start results for run-once visualization 1. Forecast State machine Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD Forecasting has more finer grained state transition than AD. In AD, each forecasting task runner now moves through INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states: * Inactive - a forecast that hasn’t been started yet * Inactive: stopped - a forecast stopped by user after running * Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing * Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data * Initializing test - a forecast is building model to run test * Initializing forecast - a forecast is building model to start running continuously * Test complete - a forecast generated a test result and stopped * Running - a forecast running continuously * Initializing test failed * Initializing forecast failed * Forecast failed See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java, 2. Dedicated forecasting config index Splits detector and forecaster configuration indices to comply resource sharing security feature (opensearch-project#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java 3. Cold-start result persistence Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the UI without waiting for post-cold-start data. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java, 4. Optimized Cold-Start Processing Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java. 5. Optional In-Memory Config Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java 6. Flatten Forecast Result Index Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java 7. Forecast Run-Once Profile Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java, 8. Run-Once Fault Tolerance Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 9. Centralized RCF Result Conversion Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java, 10. Refactor getADTask Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java 11. Conditional Forecast Result Storage Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java 12. Differentiated Shingle Sizes Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java 13. Enable Forecasting by Default Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java 14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 15. Improve Suggeste history Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java, 16. Suggest Window Delay Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java 17. Unified Task Status Updates We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java 18. Enhanced Debugging - PriorityCach Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java 19. Enhanced Debugging - PriorityTracker Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java 20. State Triaging Exception Messages Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java 21. Improved Interval Recommendation: * Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java * the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java * instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java 22. Door Keeper Exception for Run-Once Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java 21. Refactor for code reuse Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java 22. Fix result index mapping increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json Testing done: 1. manual tests. 2. added new tests. Signed-off-by: Kaituo Li <kaituo@amazon.com> * address comments Signed-off-by: Kaituo Li <kaituo@amazon.com> --------- Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Upgrade gradle 8.10.2 and JDK 23 (#1428) * Upgrade gradle 8.10.2 and JDK 23 Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Update ospackage to fix dirmode error Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> --------- Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * dependabot: bump com.netflix.nebula.ospackage from 11.5.0 to 11.11.1 (#1422) Bumps com.netflix.nebula.ospackage from 11.5.0 to 11.11.1. --- updated-dependencies: - dependency-name: com.netflix.nebula.ospackage dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Adding release notes for 3.0-alpha1 (#1432) Signed-off-by: Junwei Dai <junweid@amazon.com> Co-authored-by: Junwei Dai <junweid@amazon.com> * Use testclusters when testing with security (#1414) * Use testclusters when testing with security Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add download plugin Signed-off-by: Craig Perkins <cwperx@amazon.com> * Get js and security plugin Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add opensearchPlugin Signed-off-by: Craig Perkins <cwperx@amazon.com> * Remove duplicate Signed-off-by: Craig Perkins <cwperx@amazon.com> * Wait for yellow Signed-off-by: Craig Perkins <cwperx@amazon.com> * Fix tests Signed-off-by: Craig Perkins <cwperx@amazon.com> * Fix bwc test Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add prepareBwcTests Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to developer guide Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to CHANGELOG Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com> * adding ability to run AD with 2 local clusters (#1441) Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * distinguish local cluster when name is same as remote (#1446) Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * Adding release notes for 3.0.0.0-beta1 (#1447) Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add integtest.sh to specifically run integTestRemote task (#1456) Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Add AWS SAM template for WAF log analysis and anomaly detection (#1460) * Add AWS SAM template for WAF log analysis and anomaly detection This commit adds an AWS SAM template required by the blog post: Analyze AWS WAF logs using Amazon OpenSearch Service anomaly detection built on Random Cut Forests. The template provisions all necessary resources to set up anomaly detection for WAF logs using Amazon OpenSearch Service. Detailed instructions and further context can be found in the linked blog post. A README file is included to outline the structure and contents of the template. Testing Performed: * deployed the template and verified that processed documents appeared correctly in the WAF index. Signed-off-by: Kaituo Li <kaituo@amazon.com> * address comments Signed-off-by: Kaituo Li <kaituo@amazon.com> --------- Signed-off-by: Kaituo Li <kaituo@amazon.com> * adding release notes for 3.0.0 (#1464) * adding release notes for 3.0.0.0 Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com> * Getting changelog ready for next release Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com> --------- Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com> * Allow maven to publish to all versions (#1470) Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> * Switch guava deps from compileOnly to implementation (#1473) Signed-off-by: Craig Perkins <cwperx@amazon.com> * Introduce state machine, separate config index, improve suggest/valid… (#1479) * Introduce state machine, separate config index, improve suggest/validate APIs, and persist cold-start results for run-once visualization 1. Forecast State machine Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD Forecasting has more finer grained state transition than AD. In AD, each forecasting task runner now moves through INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states: * Inactive - a forecast that hasn’t been started yet * Inactive: stopped - a forecast stopped by user after running * Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing * Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data * Initializing test - a forecast is building model to run test * Initializing forecast - a forecast is building model to start running continuously * Test complete - a forecast generated a test result and stopped * Running - a forecast running continuously * Initializing test failed * Initializing forecast failed * Forecast failed See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java, 2. Dedicated forecasting config index Splits detector and forecaster configuration indices to comply resource sharing security feature (#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java 3. Cold-start result persistence Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the UI without waiting for post-cold-start data. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java, 4. Optimized Cold-Start Processing Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java. 5. Optional In-Memory Config Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java 6. Flatten Forecast Result Index Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java 7. Forecast Run-Once Profile Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java, 8. Run-Once Fault Tolerance Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 9. Centralized RCF Result Conversion Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java, 10. Refactor getADTask Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java 11. Conditional Forecast Result Storage Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java 12. Differentiated Shingle Sizes Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java 13. Enable Forecasting by Default Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java 14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java 15. Improve Suggeste history Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java, 16. Suggest Window Delay Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java 17. Unified Task Status Updates We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java 18. Enhanced Debugging - PriorityCach Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java 19. Enhanced Debugging - PriorityTracker Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java 20. State Triaging Exception Messages Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java 21. Improved Interval Recommendation: * Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java * the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java * instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java 22. Door Keeper Exception for Run-Once Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java 21. Refactor for code reuse Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java 22. Fix result index mapping increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json Testing done: 1. manual tests. 2. added new tests. Signed-off-by: Kaituo Li <kaituo@amazon.com> * address comments Signed-off-by: Kaituo Li <kaituo@amazon.com> --------- Signed-off-by: Kaituo Li <kaituo@amazon.com> * fix complie error Signed-off-by: Jackie <jkhanjob@gmail.com> * adding release notes for 3.1.0 (#1488) Signed-off-by: Kaituo Li <kaituo@amazon.com> * Fix incorrect task state handling in ForecastRunOnceTransportAction (#1489) Previously, the task state was not updated to a failure state if it was non-empty, even when the state represented an incomplete task (e.g., INIT_TEST). This fix ensures the task state is updated to INIT_TEST_FAILED unless it is already in an ended state (e.g., INACTIVE). Testing: - Manual testing verified correct state transitions. Signed-off-by: Kaituo Li <kaituo@amazon.com> * Fix LatestTimeRetriever range query failing on non-epoch date mappings (#1493) When the user’s time field mapping didn’t include `epoch_millis`, the numeric bounds we pass to `RangeQueryBuilder` were parsed with the field’s default format (`yyyy-MM-dd HH:mm:ss`), triggering a `SearchPhaseExecutionException: all shards failed`. This PR Import `CommonName` and call `.format(CommonName.EPOCH_MILLIS_FORMAT)` to explicitly tell OpenSearch that the `from/to` values are epoch-millis. Testing done: 1. added cypress IT: https://tinyurl.com/5n98z3ue Signed-off-by: Kaituo Li <kaituo@amazon.com> * Refine cold-start, window delay, and task updates (#1496) - Skip checkpoint writes during historical/run once - Add retryOnConflict to task updates to dodge version clashes - Switch window-delay calculation from 20 % padding (gap × 1.2) to a bucket-based approach. Motivation: the multiplicative cushion scaled with absolute lag, so a multi-hour ingest gap could inflate the delay into days, causing cold start failures. Tying the delay to config intervals (plus one safety bucket) keeps it proportional and restores prompt results. Testing done: * added IT: https://tinyurl.com/5n98z3ue Signed-off-by: Kaituo Li <kaituo@amazon.com> * Fix stopping issue when forecaster is in FORECAST_FAILURE state (#1502) Mark FORECAST_FAILURE as a non-ending state so TaskManager recognizes the task as stoppable. Previously, TaskManager.stopLatestRealtimeTask failed to stop the task, as `isDone()` returned false for FORECAST_FAILURE, causing a "job is already stopped" error. Testing: - Manually verified forecaster can now be stopped successfully from FORECAST_FAILURE state. Signed-off-by: Kaituo Li <kaituo@amazon.com> * Support >1 hr intervals (#1513) * Support daily and multi-hour intervals This commit adds support for configuration intervals exceeding one hour. Key changes: * For intervals greater than 1 hour, models are no longer loaded directly into cache. Instead, they are sent to the cold entity queue and checkpoints are reloaded at each interval. Additionally, cold entity processing priority for long-interval configs is elevated from 'LOW' to 'MEDIUM' to ensure timely processing. * Improved Suggest and Validate APIs: Replaced the previous median-based interval detection method with a robust adaptive "zoom-in/zoom-out" algorithm. The new method employs progressively refined date histograms to accurately determine optimal intervals. This enhancement enables validation and suggestions for intervals longer than 1 hour. Testing: * Conducted multi-day manual tests to verify daily interval functionality. * Added ForecastRestApiIT.testDailyInterval integration test to validate the full forecasting workflow (interval suggestion, forecaster creation, execution, and stats verification) for daily interval data. Signed-off-by: Kaituo Li <kaituo@amazon.com> * address comments Signed-off-by: Kaituo Li <kaituo@amazon.com> --------- Signed-off-by: Kaituo Li <kaituo@amazon.com> * fix compile error Signed-off-by: Jackie <jkhanjob@gmail.com> * Fixing concurrency bug on writer (#1508) * fix concurrency bug Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * fix concurrency bug Signed-off-by: Amit Galitzky <amgalitz@amazon.com> --------- Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * Remove instantiation of LockService from JS and use Mock instead (#1523) Signed-off-by: Craig Perkins <cwperx@amazon.com> * fix: advance past current interval & anchor on now (#1528) Problem -------- * `nextNiceInterval()` used a “≥” check, so when the next “nice” value equalled `currentMin` it returned the **same** interval. The interval‑explorer treats an unchanged return as a terminal condition, so exploration stopped and `suggestForecast` failed on sample‑log data. * Interval calculation anchored on the first **future** timestamp if one existed, whereas run‑once / real‑time forecasting anchors on the current time—causing the two paths to disagree on data sufficiency. Fix --- * Change comparison in `nextNiceInterval()` from `>=` to `>` so it always returns the next larger interval, letting the explorer continue. * Anchor interval calculation on the current time (`now`) instead of any future date, making all forecast modes consistent. Tests ----- * Added IT Signed-off-by: Kaituo Li <kaituo@amazon.com> * fix compile error Signed-off-by: Jackie <jkhanjob@gmail.com> * using asyncrequest instead of direct search (#1535) Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * fix spotless check Signed-off-by: Jackie <jkhanjob@gmail.com> * Updates build.gradle to conditionally download certificates (#1517) Signed-off-by: Darshit Chanpura <dchanp@amazon.com> * migrating from lang2 to lang3 (#1525) Signed-off-by: Amit Galitzky <amgalitz@amazon.com> * fix test compile Signed-off-by: Jackie <jkhanjob@gmail.com> --------- Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Junwei Dai <junweid@amazon.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Amit Galitzky <amgalitz@amazon.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Kaituo Li <kaituo@amazon.com> Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com> Signed-off-by: Jackie <jkhanjob@gmail.com> Signed-off-by: Darshit Chanpura <dchanp@amazon.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Junwei Dai <59641585+junweid62@users.noreply.github.com> Co-authored-by: Junwei Dai <junweid@amazon.com> Co-authored-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: Amit Galitzky <amgalitz@amazon.com> Co-authored-by: Rishikesh <62345295+Rishikesh1159@users.noreply.github.com> Co-authored-by: Kaituo Li <kaituo@amazon.com> Co-authored-by: Sai Medhini Reddy Maryada <117196660+saimedhi@users.noreply.github.com> Co-authored-by: Darshit Chanpura <dchanp@amazon.com>

kaituo requested review from VijayanB, amitgalitz, dbwiddis, jackiehanyang, jmazanec15, jngz-es, joshpalis, ohltyler, owaiskazi19, saratvemulapalli, sean-zheng-amazon, vamshin and ylwu-amzn as code owners May 29, 2025 23:15

opensearch-trigger-bot bot added infra Changes to infrastructure, testing, CI/CD, pipelines, etc. backport 2.x labels May 29, 2025

kaituo removed the backport 2.x label May 29, 2025

kaituo mentioned this pull request May 30, 2025

Introduces resource permissions for detectors #1400

Merged

5 tasks

kaituo added enhancement New feature or request and removed infra Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Jun 2, 2025

kaituo force-pushed the forecasting-frontend5 branch 3 times, most recently from 897ef4c to 7ac259e Compare June 2, 2025 19:54

amitgalitz reviewed Jun 6, 2025

View reviewed changes

src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java Show resolved Hide resolved

amitgalitz reviewed Jun 6, 2025

View reviewed changes

kaituo added 2 commits June 6, 2025 11:43

address comments

5b81557

Signed-off-by: Kaituo Li <kaituo@amazon.com>

kaituo force-pushed the forecasting-frontend5 branch from 7ac259e to 5b81557 Compare June 6, 2025 20:09

amitgalitz approved these changes Jun 6, 2025

View reviewed changes

kaituo merged commit 70eb22d into opensearch-project:main Jun 6, 2025
26 checks passed

Uh oh!

Introduce state machine, separate config index, improve suggest/valid… #1479

Introduce state machine, separate config index, improve suggest/valid… #1479

Uh oh!

Conversation

kaituo commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

codecov bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaituo Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

kaituo commented May 29, 2025 •

edited

Loading

codecov bot commented May 29, 2025 •

edited

Loading

kaituo Jun 6, 2025 •

edited

Loading