Skip to content

Conversation

@kaituo
Copy link
Collaborator

@kaituo kaituo commented May 29, 2025

…ate APIs, and persist cold-start results for run-once visualization

Description

  1. Forecast State machine Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD
    Forecasting has more finer grained state transition than AD.
    In AD, each forecasting task runner now moves through
    INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states:
  • Inactive - a forecast that hasn’t been started yet
  • Inactive: stopped - a forecast stopped by user after running
  • Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing
  • Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data
  • Initializing test - a forecast is building model to run test
  • Initializing forecast - a forecast is building model to start running continuously
  • Test complete - a forecast generated a test result and stopped
  • Running - a forecast running continuously
  • Initializing test failed
  • Initializing forecast failed
  • Forecast failed

See attached graph for the state machine transition graph.

state

Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java,

  1. Dedicated forecasting config index Splits detector and forecaster configuration indices to comply resource sharing security feature (Introduces resource permissions for detectors #1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java

Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java

  1. Cold-start result persistence Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the
    UI without waiting for post-cold-start data.

Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java,

  1. Optimized Cold-Start Processing Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java.

  2. Optional In-Memory Config Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java

  3. Flatten Forecast Result Index Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java

  4. Forecast Run-Once Profile Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java,

  5. Run-Once Fault Tolerance Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

  6. Centralized RCF Result Conversion Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java,

  7. Refactor getADTask Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java

  8. Conditional Forecast Result Storage Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java

  9. Differentiated Shingle Sizes Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java

  10. Enable Forecasting by Default Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java

  11. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

  12. Improve Suggeste history Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java,

  13. Suggest Window Delay Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java

  14. Unified Task Status Updates We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

  15. Enhanced Debugging - PriorityCach Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java

  16. Enhanced Debugging - PriorityTracker Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java

  17. State Triaging Exception Messages Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java

  18. Improved Interval Recommendation:

  • Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java
  • the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java
  • instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java
  1. Door Keeper Exception for Run-Once Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java

  2. Refactor for code reuse Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java

  3. Fix result index mapping increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json

Testing done:

  1. manual tests.
  2. added new tests.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented May 29, 2025

Codecov Report

Attention: Patch coverage is 74.09910% with 230 lines in your changes missing coverage. Please review.

Project coverage is 81.40%. Comparing base (ee8e38d) to head (5b81557).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...cast/transport/ForecastRunOnceTransportAction.java 28.57% 37 Missing and 3 partials ⚠️
...ansport/BaseSuggestConfigParamTransportAction.java 40.81% 28 Missing and 1 partial ⚠️
...pensearch/timeseries/feature/SearchFeatureDao.java 78.94% 9 Missing and 7 partials ⚠️
.../opensearch/forecast/task/ForecastTaskManager.java 0.00% 14 Missing ⚠️
...java/org/opensearch/timeseries/util/ModelUtil.java 72.72% 7 Missing and 5 partials ⚠️
...h/timeseries/rest/handler/LatestTimeRetriever.java 70.58% 7 Missing and 3 partials ⚠️
...rc/main/java/org/opensearch/ad/ml/ADColdStart.java 73.52% 8 Missing and 1 partial ⚠️
.../org/opensearch/forecast/ml/ForecastColdStart.java 73.52% 8 Missing and 1 partial ⚠️
...va/org/opensearch/timeseries/task/TaskManager.java 76.31% 6 Missing and 3 partials ⚠️
...ansport/ForecastRunOnceProfileTransportAction.java 38.46% 2 Missing and 6 partials ⚠️
... and 29 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1479      +/-   ##
============================================
- Coverage     81.56%   81.40%   -0.17%     
- Complexity     5890     5973      +83     
============================================
  Files           535      536       +1     
  Lines         23760    24204     +444     
  Branches       2375     2443      +68     
============================================
+ Hits          19380    19703     +323     
- Misses         3223     3295      +72     
- Partials       1157     1206      +49     
Flag Coverage Δ
plugin 81.40% <74.09%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../java/org/opensearch/ad/ADEntityProfileRunner.java 100.00% <ø> (ø)
...rg/opensearch/ad/AnomalyDetectorProfileRunner.java 100.00% <ø> (ø)
...opensearch/ad/ExecuteADResultResponseRecorder.java 84.21% <100.00%> (ø)
.../java/org/opensearch/ad/constant/ADCommonName.java 0.00% <ø> (ø)
...c/main/java/org/opensearch/ad/indices/ADIndex.java 100.00% <100.00%> (ø)
...a/org/opensearch/ad/indices/ADIndexManagement.java 84.61% <ø> (ø)
...main/java/org/opensearch/ad/ml/ADModelManager.java 80.67% <100.00%> (+1.42%) ⬆️
...ava/org/opensearch/ad/ml/ADRealTimeInferencer.java 100.00% <ø> (ø)
.../java/org/opensearch/ad/model/AnomalyDetector.java 89.87% <ø> (+1.02%) ⬆️
...pensearch/ad/ratelimit/ADCheckpointReadWorker.java 100.00% <ø> (ø)
... and 123 more

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kaituo kaituo added enhancement New feature or request and removed infra Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Jun 2, 2025
@kaituo kaituo force-pushed the forecasting-frontend5 branch 3 times, most recently from 897ef4c to 7ac259e Compare June 2, 2025 19:54
return description;
}

public static List<String> NOT_ENDED_STATES = ImmutableList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add NOT_ENDED_STATES and AWAITING_DATA_TO_RESTART to not_ended. Cause I see we use isDone() in a few places based on this list, should we include these two in not done basically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

|| checkpointReadWorker.hasInflightRequest(configId)
|| coldStartWorker.hasConfigIdInQueue(configId)
|| checkpointReadWorker.hasConfigIdInQueue(configId),
exception.isEmpty() ? null : ExceptionUtil.getErrorMessage(exception.get())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log error stack trace in this case? I see in getErrorMessage we only return full stack on last else branch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is gonna shown on the frontend. I tried to not to show a complete stack trace when possible. I will add a log message of complete stack trace for debugging.

String state = r.get().getState();
// If there is no state, update it; otherwise, it might have been set elsewhere (e.g., by ColdStartWorker)
if (Strings.isEmpty(state)) {
updateTask(forecastID, taskId, TaskState.INIT_TEST_FAILED, exceptionMsg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log exception message for debugging incase updateTask fails?

Copy link
Collaborator Author

@kaituo kaituo Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have logged in src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java when addressing another comment.

.onFailure(
new OpenSearchStatusException(
"cannot start a new test " + forecastID + " since current test hasn't finished.",
"cannot start a new test for " + forecastID + " since current test hasn't finished.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't make a comment on specific line but we should return after line 188 where we do this"
listener.onFailure(new OpenSearchStatusException(ForecastCommonMessages.DISABLED_ERR_MSG, FORBIDDEN));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. fixed.

ProfileUtil
.confirmRealtimeResultStatus(
configOptional.get(),
clock.millis() - 2 * configIntervalInMinutes * 60000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can do a null check on configIntervalInMinutes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

)
);
updateRealtimeTask(response, configId);
updateRealtimeTask(response, configId, clock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can update Instant.now() to clock usage in line 121?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed.

@Override
public boolean invalidShingleSizeRange(Integer shingleSizeToTest) {
return shingleSizeToTest != null
&& (shingleSizeToTest < ForecastSettings.MINIMUM_SHINLE_SIZE || shingleSizeToTest > TimeSeriesSettings.MAX_SHINGLE_SIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are there different shingle sizes for ad vs forecasting? also nit MINIMUM_SHINLE_SIZE -> MINIMUM_SHINGLE_SIZE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we only allow 1 feature in forecasting, if we allow shingle size 1, we get no context for every point. Shingle is our context. Forecasting would be inaccurate without context.

Changed to MINIMUM_SHINGLE_SIZE.

}

@Override
protected String triageState(Boolean hasResult, String error, Long rcfTotalUpdates) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should these return TaskState instead of string to not have to relay on enum casting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some places using string and some places using enum. We have to do casting in either option.

*/
public ForecastRunOnceProfileRequest(String configId, DiscoveryNode... nodes) {
super(nodes);
/*Important to have this constructor. Otherwise, OS silently ignore the broadcast request.*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused here, can you expand on this at least in response, doesn't have to be a new comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant without this constructor, broadcast call will be dropped by opensearch silently.

// user is the one who triggered the caller of this function
user,
client,
AnalysisType.AD,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this method just used by AD?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed to

clientUtil
            .<SearchRequest, SearchResponse>asyncRequestWithInjectedSecurity(
                request,
                client::search,
                // user is the one who triggered the caller of this function
                user,
                client,
                config instanceof AnomalyDetector ? AnalysisType.AD : AnalysisType.FORECAST,
                searchResponseListener
            );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this occurs in a few methods across searchfeaturedao so good to double check those

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checked other places. Seems fine.

*
* @param response the {@link SearchResponse} returned by OpenSearch
* @param config configuration
* @param includesEmptyBucket if {@code true}, a bucket with {@code doc_count == 0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because we changed to mindoccount to 0 in other part of code this is always true? Additionally we don't seem to actually utilize this variable anywhere in method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed


NodeState state = states.computeIfAbsent(configID, configId -> new NodeState(configId, clock));
state.setConfigDef(config);
if (cache) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point didn't we already make the request to get the config, so fetching from cache should happen earlier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache here means whether we should put the get config response to cache. Fetching from cache happens in the caller of this method.

String taskId = coldStartRequest.getTaskId();
if (taskId != null) {
Map<String, Object> updatedFields = new HashMap<>();
updatedFields.put(TimeSeriesTask.STATE_FIELD, TaskState.INACTIVE.name());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it always being set to inactive here, a little confused on the run once scenario here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is for the cold start failure where we don't have enough data. Since it is run once, we won't retry as we did in real time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, could either INACTIVE or INIT_TEST_FAILED work here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the state transition graph from the overview section, we need to use INACTIVE.

kaituo added 2 commits June 6, 2025 11:43
…ate APIs, and persist cold-start results for run-once visualization

1. Forecast State machine
Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD
  Forecasting has more finer grained state transition than AD.
  In AD, each forecasting task runner now moves through
  INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states:

* Inactive - a forecast that hasn’t been started yet
* Inactive: stopped - a forecast stopped by user after running
* Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing
* Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data
* Initializing test - a forecast is building model to run test
* Initializing forecast - a forecast is building model to start running continuously
* Test complete - a forecast generated a test result and stopped
* Running - a forecast running continuously
* Initializing test failed
* Initializing forecast failed
* Forecast failed

See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java,

2. Dedicated forecasting config index
Splits detector and forecaster configuration indices to comply resource sharing security feature (opensearch-project#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java

Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java

3. Cold-start result persistence
Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the
  UI without waiting for post-cold-start data.

  Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java,

4. Optimized Cold-Start Processing
Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java.

5. Optional In-Memory Config
Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java

6. Flatten Forecast Result Index
Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java

7. Forecast Run-Once Profile
Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java,

8. Run-Once Fault Tolerance
Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

9. Centralized RCF Result Conversion
Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java,

10. Refactor getADTask
Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java

11. Conditional Forecast Result Storage
Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java

12. Differentiated Shingle Sizes
Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java

13. Enable Forecasting by Default
 Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java

14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

15. Improve Suggeste history
Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java,

16. Suggest Window Delay
 Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java

17. Unified Task Status Updates
We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

18. Enhanced Debugging - PriorityCach
Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java

19. Enhanced Debugging - PriorityTracker
Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java

20. State Triaging Exception Messages
Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java

21. Improved Interval Recommendation:
* Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java
* the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java
* instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java

22. Door Keeper Exception for Run-Once
Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java

21. Refactor for code reuse
Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java

22. Fix result index mapping
increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json

Testing done:
1. manual tests.
2. added new tests.

Signed-off-by: Kaituo Li <kaituo@amazon.com>
Signed-off-by: Kaituo Li <kaituo@amazon.com>
@kaituo kaituo force-pushed the forecasting-frontend5 branch from 7ac259e to 5b81557 Compare June 6, 2025 20:09
@kaituo kaituo merged commit 70eb22d into opensearch-project:main Jun 6, 2025
26 checks passed
jackiehanyang pushed a commit to jackiehanyang/anomaly-detection that referenced this pull request Aug 4, 2025
opensearch-project#1479)

* Introduce state machine, separate config index, improve suggest/validate APIs, and persist cold-start results for run-once visualization

1. Forecast State machine
Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD
  Forecasting has more finer grained state transition than AD.
  In AD, each forecasting task runner now moves through
  INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states:

* Inactive - a forecast that hasn’t been started yet
* Inactive: stopped - a forecast stopped by user after running
* Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing
* Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data
* Initializing test - a forecast is building model to run test
* Initializing forecast - a forecast is building model to start running continuously
* Test complete - a forecast generated a test result and stopped
* Running - a forecast running continuously
* Initializing test failed
* Initializing forecast failed
* Forecast failed

See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java,

2. Dedicated forecasting config index
Splits detector and forecaster configuration indices to comply resource sharing security feature (opensearch-project#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java

Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java

3. Cold-start result persistence
Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the
  UI without waiting for post-cold-start data.

  Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java,

4. Optimized Cold-Start Processing
Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java.

5. Optional In-Memory Config
Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java

6. Flatten Forecast Result Index
Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java

7. Forecast Run-Once Profile
Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java,

8. Run-Once Fault Tolerance
Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

9. Centralized RCF Result Conversion
Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java,

10. Refactor getADTask
Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java

11. Conditional Forecast Result Storage
Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java

12. Differentiated Shingle Sizes
Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java

13. Enable Forecasting by Default
 Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java

14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

15. Improve Suggeste history
Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java,

16. Suggest Window Delay
 Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java

17. Unified Task Status Updates
We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

18. Enhanced Debugging - PriorityCach
Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java

19. Enhanced Debugging - PriorityTracker
Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java

20. State Triaging Exception Messages
Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java

21. Improved Interval Recommendation:
* Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java
* the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java
* instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java

22. Door Keeper Exception for Run-Once
Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java

21. Refactor for code reuse
Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java

22. Fix result index mapping
increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json

Testing done:
1. manual tests.
2. added new tests.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* address comments

Signed-off-by: Kaituo Li <kaituo@amazon.com>

---------

Signed-off-by: Kaituo Li <kaituo@amazon.com>
jackiehanyang added a commit that referenced this pull request Aug 7, 2025
* Upgrade gradle 8.10.2 and JDK 23 (#1428)

* Upgrade gradle 8.10.2 and JDK 23

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Update ospackage to fix dirmode error

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

---------

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* dependabot: bump com.netflix.nebula.ospackage from 11.5.0 to 11.11.1 (#1422)

Bumps com.netflix.nebula.ospackage from 11.5.0 to 11.11.1.

---
updated-dependencies:
- dependency-name: com.netflix.nebula.ospackage
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding release notes for 3.0-alpha1 (#1432)

Signed-off-by: Junwei Dai <junweid@amazon.com>
Co-authored-by: Junwei Dai <junweid@amazon.com>

* Use testclusters when testing with security (#1414)

* Use testclusters when testing with security

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Add download plugin

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Get js and security plugin

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Add opensearchPlugin

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Remove duplicate

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Wait for yellow

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Fix tests

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Fix bwc test

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Add prepareBwcTests

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Add to developer guide

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Add to CHANGELOG

Signed-off-by: Craig Perkins <cwperx@amazon.com>

---------

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* adding ability to run AD with 2 local clusters (#1441)

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* distinguish local cluster when name is same as remote (#1446)

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* Adding release notes for 3.0.0.0-beta1 (#1447)

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Add integtest.sh to specifically run integTestRemote task (#1456)

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Add AWS SAM template for WAF log analysis and anomaly detection (#1460)

* Add AWS SAM template for WAF log analysis and anomaly detection

This commit adds an AWS SAM template required by the blog post: Analyze AWS WAF logs using Amazon OpenSearch Service anomaly detection built on Random Cut Forests. The template provisions all necessary resources to set up anomaly detection for WAF logs using Amazon OpenSearch Service. Detailed instructions and further context can be found in the linked blog post. A README file is included to outline the structure and contents of the template.

Testing Performed:
* deployed the template and verified that processed documents appeared correctly in the WAF index.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* address comments

Signed-off-by: Kaituo Li <kaituo@amazon.com>

---------

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* adding release notes for 3.0.0 (#1464)

* adding release notes for 3.0.0.0

Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com>

* Getting changelog ready for next release

Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com>

---------

Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com>

* Allow maven to publish to all versions (#1470)

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

* Switch guava deps from compileOnly to implementation (#1473)

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Introduce state machine, separate config index, improve suggest/valid… (#1479)

* Introduce state machine, separate config index, improve suggest/validate APIs, and persist cold-start results for run-once visualization

1. Forecast State machine
Introduces a detailed forecasting state machine with finer-grained state transitions compared to AD
  Forecasting has more finer grained state transition than AD.
  In AD, each forecasting task runner now moves through
  INIT → RUNNING → STOPPED → FAILED transitions. Forecasting has the following states:

* Inactive - a forecast that hasn’t been started yet
* Inactive: stopped - a forecast stopped by user after running
* Awaiting data to initialize forecast - a forecast is attempting to start but there is not enough data for the model to start initializing
* Awaiting data to restart forecast - a forecast is attempting to restart after running but there is not enough data
* Initializing test - a forecast is building model to run test
* Initializing forecast - a forecast is building model to start running continuously
* Test complete - a forecast generated a test result and stopped
* Running - a forecast running continuously
* Initializing test failed
* Initializing forecast failed
* Forecast failed

See attached graph for the state machine transition graph. Read src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskManager.java, src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/model/TaskState.java, src/main/java/org/opensearch/ad/task/ADTaskManager.java, src/main/java/org/opensearch/forecast/transport/GetForecasterTransportAction.java, src/main/java/org/opensearch/timeseries/task/TaskCacheManager.java, src/main/java/org/opensearch/timeseries/transport/BaseGetConfigTransportAction.java,

2. Dedicated forecasting config index
Splits detector and forecaster configuration indices to comply resource sharing security feature (#1400). Resource Sharing will only be supported for one resource type to one resource index. Read src/main/java/org/opensearch/forecast/indices/ForecastIndex.java, src/main/java/org/opensearch/timeseries/TimeSeriesAnalyticsPlugin.java, src/main/java/org/opensearch/timeseries/stats/StatNames.java

Also, since one plugin can only use one job index, forecasting and AD have to share one job index. Job id and config id are equal. Adds prefixes to forecasting job IDs to avoid clashes in shared job index. Read src/main/java/org/opensearch/timeseries/rest/handler/AbstractTimeSeriesActionHandler.java

3. Cold-start result persistence
Persists cold-start training samples and initial inference results to the result index, enabling “Run once” charts in the
  UI without waiting for post-cold-start data.

  Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java, src/main/java/org/opensearch/timeseries/ratelimit/ColdStartWorker.java,

4. Optimized Cold-Start Processing
Cold-start uses sequential processing for cold-start samples instead of calling process one point a time. This can reduce redundant bounding box computations. Read src/main/java/org/opensearch/ad/ml/ADColdStart.java and src/main/java/org/opensearch/forecast/ml/ForecastColdStart.java.

5. Optional In-Memory Config
Makes in-memory config caching optional to avoid stale configurations during repeated "Run once" executions. For run once execution, users may change config and click run once. We don't want to remember old configuration. Read src/main/java/org/opensearch/timeseries/NodeStateManager.java

6. Flatten Forecast Result Index
Implements result index flattening functionality, ensuring feature parity between forecasting and AD. Read src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java, src/main/java/org/opensearch/forecast/transport/ForecastResultBulkTransportAction.java

7. Forecast Run-Once Profile
Add forecast run once profile transport action. This would help us find out whether current run once finished or not. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileTransportAction.java, src/main/java/org/opensearch/timeseries/ratelimit/BatchWorker.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileRequest.java, src/main/java/org/opensearch/forecast/transport/ForecastRunOnceProfileResponse.java, src/main/java/org/opensearch/timeseries/ratelimit/RateLimitedRequestWorker.java,

8. Run-Once Fault Tolerance
Implements fault tolerance in run-once executions to handle transient search failures. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

9. Centralized RCF Result Conversion
Refactors and centralizes logic for converting RCF results in ModelUtil.toResult. Read src/main/java/org/opensearch/timeseries/util/ModelUtil.java, src/main/java/org/opensearch/ad/ml/ADModelManager.java,

10. Refactor getADTask
Move the method getADTask to super class so that forecasting can reuse it. Rename the method to getTask. Read src/main/java/org/opensearch/ad/task/ADTaskManager.java

11. Conditional Forecast Result Storage
Only save forecast result when data quality is larger than 0. At the beginning during cold start we will get a lot of 0 forecasts whose data quality is 0. Read src/main/java/org/opensearch/forecast/model/ForecastResult.java

12. Differentiated Shingle Sizes
Forecaster and Detector have different minimum shingle size. Override invalidShingleSizeRange in Forecaster to reflect that. Read src/main/java/org/opensearch/forecast/model/Forecaster.java, src/main/java/org/opensearch/timeseries/model/Config.java

13. Enable Forecasting by Default
 Activates forecasting features by default, preparing for imminent feature release. Read src/main/java/org/opensearch/forecast/settings/ForecastEnabledSetting.java

14. Run-Once Task Error Handling. Read src/main/java/org/opensearch/forecast/transport/ForecastRunOnceTransportAction.java

15. Improve Suggeste history
Modifies parameter suggestion logic by invoking interval suggestions before history suggestions, replacing previous hardcoded values. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/HistorySuggest.java, src/main/java/org/opensearch/timeseries/transport/BaseSuggestConfigParamTransportAction.java,

16. Suggest Window Delay
 Incorporates delay suggestions into forecasting parameters. Read src/main/java/org/opensearch/forecast/transport/SuggestForecasterParamTransportAction.java, src/main/java/org/opensearch/forecast/transport/SuggestName.java

17. Unified Task Status Updates
We have logic to update task status earlier before next interval starts to avoid long initialization problem. Previously we have different logic for single stream and high-cardinality. But since we have already combined these two implementation, we combined task status update logic. Read src/main/java/org/opensearch/timeseries/ExecuteResultResponseRecorder.java

18. Enhanced Debugging - PriorityCach
Rewrite PriorityCache.getTotalUpdates for easier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityCache.java

19. Enhanced Debugging - PriorityTracker
Rewrite PriorityTracker.getHighestPriorityEntityId for eadier debugging. Read src/main/java/org/opensearch/timeseries/caching/PriorityTracker.java

20. State Triaging Exception Messages
Add exception message prefix for triaging state. This is related to forecasting state machine mentioned earlier. Read src/main/java/org/opensearch/timeseries/common/exception/TimeSeriesException.java, src/main/java/org/opensearch/forecast/task/ForecastTaskManager.java

21. Improved Interval Recommendation:
* Enhances recommendation accuracy by counting shingles instead of individual points. Read src/main/java/org/opensearch/timeseries/feature/AbstractRetriever.java, src/main/java/org/opensearch/timeseries/feature/SearchFeatureDao.java, src/main/java/org/opensearch/timeseries/transport/AbstractSingleStreamResultTransportAction.java, src/main/java/org/opensearch/timeseries/rest/handler/IntervalCalculation.java
* the shingle check is also introduced in validation: src/main/java/org/opensearch/timeseries/rest/handler/ModelValidationActionHandler.java
* instead of using top entities, using entities with median frequency: read src/main/java/org/opensearch/timeseries/rest/handler/LatestTimeRetriever.java

22. Door Keeper Exception for Run-Once
Adjusts door keeper logic to exempt run-once executions from restrictions intended for repeated cold-start failures. Read src/main/java/org/opensearch/timeseries/ml/ModelColdStart.java

21. Refactor for code reuse
Move AnomalyDetector.onlyParseNumberValue to Config so that Forecaster can use it: src/main/java/org/opensearch/forecast/model/Forecaster.java

22. Fix result index mapping
increment result index version and add a new filed that previous flattening result index PR forgets to add. Read src/main/resources/mappings/config.json

Testing done:
1. manual tests.
2. added new tests.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* address comments

Signed-off-by: Kaituo Li <kaituo@amazon.com>

---------

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* fix complie error

Signed-off-by: Jackie <jkhanjob@gmail.com>

* adding release notes for 3.1.0 (#1488)

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Fix incorrect task state handling in ForecastRunOnceTransportAction (#1489)

Previously, the task state was not updated to a failure state if it was non-empty, even when the state represented an incomplete task (e.g., INIT_TEST). This fix ensures the task state is updated to INIT_TEST_FAILED unless it is already in an ended state (e.g., INACTIVE).

Testing:
- Manual testing verified correct state transitions.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Fix LatestTimeRetriever range query failing on non-epoch date mappings (#1493)

When the user’s time field mapping didn’t include `epoch_millis`, the numeric bounds we pass to `RangeQueryBuilder` were parsed with the field’s default format (`yyyy-MM-dd HH:mm:ss`), triggering a `SearchPhaseExecutionException: all shards failed`. This PR Import `CommonName` and call `.format(CommonName.EPOCH_MILLIS_FORMAT)` to explicitly tell OpenSearch that the `from/to` values are epoch-millis.

Testing done:
1. added cypress IT: https://tinyurl.com/5n98z3ue

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Refine cold-start, window delay, and task updates (#1496)

- Skip checkpoint writes during historical/run once
- Add retryOnConflict to task updates to dodge version clashes
- Switch window-delay calculation from 20 % padding (gap × 1.2) to a bucket-based approach. Motivation: the multiplicative cushion scaled with absolute lag, so a multi-hour ingest gap could inflate the delay into days, causing cold start failures.  Tying the delay to config intervals (plus one safety bucket) keeps it proportional and restores prompt results.

Testing done:
* added IT: https://tinyurl.com/5n98z3ue

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Fix stopping issue when forecaster is in FORECAST_FAILURE state (#1502)

Mark FORECAST_FAILURE as a non-ending state so TaskManager recognizes the task as stoppable. Previously, TaskManager.stopLatestRealtimeTask failed to stop the task, as `isDone()` returned false for FORECAST_FAILURE, causing a "job is already stopped" error.

Testing:
- Manually verified forecaster can now be stopped successfully from FORECAST_FAILURE state.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* Support >1 hr intervals (#1513)

* Support daily and multi-hour intervals

This commit adds support for configuration intervals exceeding one hour.

Key changes:

* For intervals greater than 1 hour, models are no longer loaded directly into cache. Instead, they are sent to the cold entity queue and checkpoints are reloaded at each interval. Additionally, cold entity processing priority for long-interval configs is elevated from 'LOW' to 'MEDIUM' to ensure timely processing.
* Improved Suggest and Validate APIs: Replaced the previous median-based interval detection method with a robust adaptive "zoom-in/zoom-out" algorithm. The new method employs progressively refined date histograms to accurately determine optimal intervals. This enhancement enables validation and suggestions for intervals longer than 1 hour.

Testing:

* Conducted multi-day manual tests to verify daily interval functionality.
* Added ForecastRestApiIT.testDailyInterval integration test to validate the full forecasting workflow (interval suggestion, forecaster creation, execution, and stats verification) for daily interval data.

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* address comments

Signed-off-by: Kaituo Li <kaituo@amazon.com>

---------

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* fix compile error

Signed-off-by: Jackie <jkhanjob@gmail.com>

* Fixing concurrency bug on writer (#1508)

* fix concurrency bug

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* fix concurrency bug

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

---------

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* Remove instantiation of LockService from JS and use Mock instead (#1523)

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* fix: advance past current interval & anchor on now (#1528)

Problem
--------
* `nextNiceInterval()` used a “≥” check, so when the next “nice” value
  equalled `currentMin` it returned the **same** interval.
  The interval‑explorer treats an unchanged return as a terminal
  condition, so exploration stopped and `suggestForecast` failed on
  sample‑log data.
* Interval calculation anchored on the first **future** timestamp if one
  existed, whereas run‑once / real‑time forecasting anchors on the current
  time—causing the two paths to disagree on data sufficiency.

Fix
---
* Change comparison in `nextNiceInterval()` from `>=` to `>` so it always
  returns the next larger interval, letting the explorer continue.
* Anchor interval calculation on the current time (`now`) instead of any
  future date, making all forecast modes consistent.

Tests
-----
* Added IT

Signed-off-by: Kaituo Li <kaituo@amazon.com>

* fix compile error

Signed-off-by: Jackie <jkhanjob@gmail.com>

* using asyncrequest instead of direct search (#1535)

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* fix spotless check

Signed-off-by: Jackie <jkhanjob@gmail.com>

* Updates build.gradle to conditionally download certificates (#1517)

Signed-off-by: Darshit Chanpura <dchanp@amazon.com>

* migrating from lang2 to lang3 (#1525)

Signed-off-by: Amit Galitzky <amgalitz@amazon.com>

* fix test compile

Signed-off-by: Jackie <jkhanjob@gmail.com>

---------

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Junwei Dai <junweid@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Amit Galitzky <amgalitz@amazon.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Kaituo Li <kaituo@amazon.com>
Signed-off-by: Sai Medhini Reddy Maryada <saimedhi@amazon.com>
Signed-off-by: Jackie <jkhanjob@gmail.com>
Signed-off-by: Darshit Chanpura <dchanp@amazon.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Junwei Dai <59641585+junweid62@users.noreply.github.com>
Co-authored-by: Junwei Dai <junweid@amazon.com>
Co-authored-by: Craig Perkins <cwperx@amazon.com>
Co-authored-by: Amit Galitzky <amgalitz@amazon.com>
Co-authored-by: Rishikesh <62345295+Rishikesh1159@users.noreply.github.com>
Co-authored-by: Kaituo Li <kaituo@amazon.com>
Co-authored-by: Sai Medhini Reddy Maryada <117196660+saimedhi@users.noreply.github.com>
Co-authored-by: Darshit Chanpura <dchanp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants