Skip to content

[Proposal] Publish an event when long running operation complete #5479

@xluo-aws

Description

@xluo-aws

Is your feature request related to a problem? Please describe.
There are index operations that may take tens of minutes or even hours, for example, reindex, split, shrink , etc. We want to send out notifications(configured in ISM dashboard plugin) to user when they are completed, no matter the operation is submitted from ISM dashboard plugin or command line.

Describe the solution you'd like
We brainstormed a few options, the preferred one is to enhance opensearch core logic to publish an event when operation is complete. We can create listener in ISM plugin to listen to the event and send out notification. We checked existing event that plugin listen to, ClusterChangeEvent is one (not sure if there are others) that will be published when we split/shrink an index. However this event doesn't have information that's required to send out notification, for example, who submit the operation request. Other cons of this solution is Reindex will not trigger ClusterChangeEvent, so it's not a general solution.
Another possible solution is to publish a new event when long running operation is triggered. The listener in plugin will create a scheduled event to check the operation status every x minutes and send out notification once it's completed. The extension point could be extend RestToXContentListener for RestResizeHandler and RestReindexAction to publish an event, or extend TransportResizeAction/TransportReindexAction to publish event This is similar to the 2nd alternative below but has less impact because it only affects few long running operations.

Describe alternatives you've considered
1 Create wrapper API in ISM plugin, it will call existing index operation API first then create a scheduled job to check operation status every x minutes then send out notification once it's completed. This requires user to switch to new wrapper API.
2 Create actionFilter in ISM plugin to filter all requests and create a scheduled job if the request is long running operations. The major concern is performance impact. However ISM already has an actionFilter that intercept all request, we guess this solution should already have passed performance review so it's not a totally new performance risk. We can do some performance test if this can be a candidate solution.
3 For reindex, we can leverage IndexingOperationListener to monitor .task index, reindex will write to this index upon completion, we can then send out notification. For Shrink and Split, we can leverage ClusterStateChange event to find out which index is created and whether it's created due to resize or not, if it's resize, we compare its shard with source index shards to figure out it's split or shrink, then we wait for active shards to be ready(same logic as how we tell a create index operation is done) and send out notification. All coding change is in ISM plugin.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or requestextensions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions