Description
summary of problem: CS-IndexLifecycleMetadata, indexMetaData-current_step info, and policyRegistry can get out of sync
The PolicyStepsRegistry for an IndexLifecycleService seems to be able to be both updated
and used at the same time. It seems as though it is possible that the execution bits of the service can be attempting to read from the registry at the same time that updates are occurring.
Updates to the registry happen when cluster-state-appliers are called.
Reads to the registry happen in two cases
There is nothing preventing the system from being in both the update and/or (1),(2) at the same time.
It is also not clear from existing tests whether this is actually a problem. From a rough peek I see an IllegalStateException to be thrown in two scenarios
A. API deletes/changes an action in a policy such that a step is removed
B. API adds a policy to an index and runPolicy
is executed from the scheduled-job before the cluster-state-applier populates the registry
C. API changes policy such that the index is said to currently be on a specific step that has not yet been populated by the applier (similar to B.)
D. runPolicy was triggered after the node was un-elected master and there is a stale representation of the truly up-to-date cluster state with changes to policies that resulted in indices moving to new steps that didn't exist before in the stale registry
I do not think this to be an issue in (2) since it is fine if the execution task fails and tries again in a following iteration. One costly way to stop the IllegalStateException
is to re-compute the steps on every run. Other ways may, indeed, exist, but further investigation is warranted.