Closed
Description
Tasks
- Write LifecycleAction for each type of action
- delete
- forcemerge EMAIL REDACTED LINK REDACTED)
- rollover EMAIL REDACTED LINK REDACTED)
- allocate EMAIL REDACTED LINK REDACTED)
- shrink EMAIL REDACTED LINK REDACTED)
- replica EMAIL REDACTED LINK REDACTED)
snapshot(will implement snapshotting as a separate solution)
- Create concept of a lifecycle type which will: EMAIL REDACTED LINK REDACTED)
- Constrain the available phase names
- Set the order in which the phases are executed
- Create concept of Phase types which will:
- Set the actions that are available in each phase (LINK REDACTED)
- Set the order in which the actions are executed within each phase
- Remove shuffled fields exception for phases field in unit tests (IndexLifecycleMetadataTests, LifecyclePolicyTests, PutLifecycleRequestTests)
- Create the first lifecycle type
timeseries
, which will allow the following phases (in order): EMAIL REDACTED LINK REDACTED LINK REDACTED LINK REDACTED)- Hot - Actions:
- rollover
- Warm - Actions:
- allocate
- shrink
- forcemerge
- replicas
- Cold - Actions:
- allocate
- replicas
- Delete - Actions:
- delete
- Hot - Actions:
- Verify Master election re-initialization strategy. Once a Master with an existing in-memory schedule is dropped, the new master needs to be able to re-initialize all the state and relaunch to-be launched tasks. It helps that all time is to be relative to the index.creation.date
- Add ability to change the poll interval through cluster settings EMAIL REDACTED LINK REDACTED)
- stop using
IndexMetaData.getCreationDate
and use a custom setting so that it can be inherited across shrink and other operations EMAIL REDACTED LINK REDACTED) - Clean up logging
- Allow the scheduled job to be added and removed while the node is still running when it is elected and un-elected as master. EMAIL REDACTED LINK REDACTED)
- Introduce
index.lifecycle.phase_time
andindex.lifecycle.action_time
to help track - update Shrink Action to properly support self-allocation to specific node from specified attributes
tracking Steps progress
- PhaseAfterStep
- InitializationPolicyContextStep
- TerminalPolicyStep
- AllocateAction
- EnoughShardsWaitStep
- UpdateAllocationSettingsStep
- AllocationRoutedStep
- DeleteAction
- DeleteStep
- ForceMergeAction
- UpdateBestCompressionSettingsStep
- ForceMergeStep (upgrade?)
- SegmentCountStep
- ReadOnlyAction
- ReadOnlyStep
- ReplicasAction
- UpdateReplicaSettingsStep
- EnoughShardsWaitStep
- RolloverAction
- RolloverStep
- ShrinkAction
- ShrinkStep
- ShrunkShardsAllocatedStep
- AliasStep
- ShrunkenIndexCheckStep
Remaining Tasks
Completed
- Make phase after step have the phase name of the current phase not the next phase (i.e. when the warm actions are done the phase after step should have a phase of
warm
) @colings86 Changes PhaseAfterStep to take the name of the previous phase #30756 - Add
index.lifecycle.skip
setting to allow indexes to be put into a "maintenance" mode where index lifecycle will not touch them @talevy add index.lifecycle.skip setting for skipping policy execution #30766 - Get Index lifecycle working with security (at the moment if security is disabled setting the phase and action fails because we are trying to modify settings with a system user which is not allowed) @colings86 Stores security headers with the LifecyclePolicy and uses them for AsyncSteps #30657
- Add Lifecycle Explain API @colings86
- internal state management
- write REST/transport actions
- Update
index.lifecycle.date
in Rollover for rolled over indices @talevy inherit [index.lifecycle.date] from rolled-over time #30853 - Handle policy updates and changes to
index.lifecycle.name
: @colings86- Remove shard and replica setting checks from wait steps @talevy remove requirement for shards/replicas in allocation check steps #30855
- Add new setting property to prevent updating
index.lifecycle.name
using the Update Settings API @jasontedor Add notion of internal index settings #31286 - Do not allow deleting lifecycle policies that are in-use by one or more indexes
- Allow PUT index lifecycle API to update an existing policy @colings86 Adds ability to update a policy #31361
- Only succeeds if there is no change to the shrink action OR no affected indices are in the shrink action @colings86 Adds a check to only fail policy update if unsafe action is changed #32002
- Add assign policy to index API (REST endpoint
PUT {index}/_lifecycle/{policy}
andPOST {index}/_lifecycle/{policy}
) @colings86 Adds API to assign or change the policy for an index #31277 - Add remove policy from index API (REST endpoint
DELETE {index}/_lifecycle
) @colings86 Adds an API to remove ILM from an index completely #31358 - Add change policy for index API (REST endpoint
PUT {index}/_lifecycle/{policy}
) @colings86 Adds API to assign or change the policy for an index #31277- Will not move the policy for an index (and will report this rejection) if there is a change to the shrink action AND the index is in the shrink action @colings86 Adds a check to only fail policy update if unsafe action is changed #32002
- Will succeed on other indexes even if some indexes fail the condition @colings86 Adds API to assign or change the policy for an index #31277
- Changes to Rollover Action to support
is_write_index
- Add API to force set to an explicit step
- Add API that can be run when the index is in an error state and will force execution to retry the step that caused the error @talevy add _retry API to index lifecycle policies #30769
- Add ability to put ILM in maintenance mode which will stop processing any indexes which are not in the shrink action and wait for all other indices to complete the shrink action before stopping processing them (@talevy)
- Fix Exception wrapping code in Explain API to properly render exception's messages @talevy render non-ElasticsearchException in ILM #31284
-
Update ForceMergeAction to update max segment size before merging due to: Add ways to force-merge down to 1 segment #31742(for 6.x there is nothing to do here as we will maintain the current behaviour in the ForceMerge API - Remove best_compression option from ForceMerge Action @dakrone Remove "best_compression" option from the ForceMergeAction #32373
- make lifecycle name, phase, action, step settings Property.INTERNAL @dakrone Make various LifecycleSettings Settings internal #32381 Make index.lifecycle.name setting internal #32518
- Fix remaining NOCOMMITS
- Get rid of ObjectParserUtils @colings86 Removes redundent NORELEASES and ObjectParserUtils #32427 *
- Change default poll interval setting (need to decide what it should be) change default indices.lifecycle.poll_interval to something sane #32521
- Investigate changing core to get rid of UpdateSettingsHelper and RolloverIndexTestHelper ** @dakrone Remove UpdateSettingsTestHelper class #32557 Remove RolloverIndexTestHelper #32559
- Remove NORELEASE on randomising policy and phase names in tests as no longer relevant @colings86 Removes redundent NORELEASES and ObjectParserUtils #32427 *
- Remove replica action and add ability to set replicas to allocate action @talevy move replicas action functionality into AllocateAction #32523
- Leverage
is_write_index
for managing indices/aliases in ILM: [meta] Alias is_write_index feature tracking #31959 @talevy *** - Check concurrency safety
- Check MoveToStepAction/RetryAction and its guards against invalid steps and metadata states (need to check if there is anything to do here) @talevy *
- Clear up the thread-safety of updating/bootstrapping the local policyStepsRegistry Test and Verify whether PolicyStepsRegistry is used safely in ILM #32181 @talevy *
Handle policy updates and changes toindex.lifecycle.name
: @colings86Allow PUT index lifecycle API to update an existing policyIf current step no longer exists in the policy and instead move to the next action that exists (or the next phase after step) @colings86 ***
Add change policy for index API (REST endpointPUT {index}/_lifecycle/{policy}
)If current step no longer exists in the policy and instead move to the next action that exists (or the next phase after step) Skips to next available action on missing step #32283 @colings86 ***
- Security tests
- Create "ilm-tests-with-security" qa module and run YAML tests we have add qa project for running ILM tests against security #32218 *
- Create tests which have two users, one who creates the policy and another who assigns the policy to an index. ** add user authentication test for ILM #32826
- Then check the policy works if the user who created the policy has all the permissions needed **
- Check that if the user who created the policy doesn't have the correct permissions for an action we move to the error step ** add user authentication test for ILM #32826
- Master failover tests
- Test failover of master while on a cluster state wait step re-enable ILM integration tests and fix policyRegistry update bug #32108 @talevy **
- Create Client-Side copies of classes LifecyclePolicy so Client can put/get policies to/from json @talevy
- LifecycleActions
- Allocate Action migrate allocate action pojo/xcontent to xpack.protocol #32853
- Delete, ForceMerge, ReadOnly, Rollover, Shrink migrate more actions to protocol.xpack #32892
- LifecyclePolicy/Phase copy LifecyclePolicy to protocol.xpack #32915
- LifecycleActions
- Backport to 6.x * index-lifecycle-6.x
- Change lifecycle name to ILM? Should the index lifecycle plugin name be changed to "ilm" #33265
- Add support in transport client for ILM APIs @colings86 Adds ILMClient for use with transport client #33357
- fix rendering of
after
@colings86 ILM: fixafter
rendering to xcontent #33282 - add LifecyclePolicy version and creation_date metadata @talevy add notion of version and creation_date to LifecyclePolicyMetadata #33450
Blockers to merging into master in priority order from most to least (items are marked in difficulty using *, **, ***)
- Store phase JSON in index metadata when an index moves to a new phase and use that version to drive execution until the end of the phase regardless of changes to the underlying policy @dakrone
- PolicyStepsRegistry will need to store the JSON for the current phase as it is in the lifecycle policy per index (for the current phase the index is on) (ie
Phase.toXContent()
). Instead of holding on to all of the compiled steps PolicyStepsRegistry should have a map of index to the steps of the current phase. Store phase steps for index in PolicyStepsRegistry #32926 - Remove the PhaseAfterStep in favour of having the execution flow itself determine when the index should move to the next phase (based on the after parameter in the next phase from the IndexLifecycleMetadata) @dakrone Remove PhaseAfterStep #33140 Replace PhaseAfterStep with PhaseCompleteStep #33398 ***
- When a new index appears or when an index moves to a new phases the registry will need to compile the steps for the new phases and store them in memory in the PolicyStepsRegistry
- Store the JSON for the current phase in the index settings (to be metadata later on), the policy steps registry should use this as the source of truth for how to execute the current phase for that index @talevy add new phase definition setting used for retrieving phase to execute #33289
- Update lifecycle Explain API to return the information for the phase that is stored in the index metadata (including the policy name for that phase and the version of the policy) @talevy add version, policyname, and modifiedDate to ILM Explain API #33488 *
- Move client to be a class variable in PolicyStepsRegistry so don't have to pass it in @dakrone
- When getStep is called compile steps from phase definition and get the step from that list @dakrone
- Make phase transition and phase definition storing synchronous ?
- PolicyStepsRegistry will need to store the JSON for the current phase as it is in the lifecycle policy per index (for the current phase the index is on) (ie
- use Custom Index MetaData to store lifecycle state of action/phase/step info for a managed index @gwbrown ** (Use custom index metadata for ILM state #33783)
- 4-node all-actions in all phases test @talevy ILM integration test with full policy #33402 **
- Make read only step only added if the warm phase contains forcemerge or shrink ILM policy with empty "warm" phase still enforces read only indices #33485 @gwbrown
- Step locking: Only one step executes at a time per index. prevent step execution pile-up from occurring. @dakrone Change step execution flow to be deliberate about type #34126
- Tests IT:
- timeseries async race condition: verify that rollover/shrink/delete can be called twice before the first action's onResponse is called @talevy ILM integration test with full policy #33402
- Tests IT:
Blockers to first release in priority order from most to least (items are marked in difficulty using *, **, ***)
- Write Documentation **** @colings86 @talevy
- High level REST client support Java high-level REST client completeness for ILM APIs #33100 **** (Remaining task @gwbrown)
- clean up experience when invalid policy is used by index ILM: IndexLifecycleRunner should not spam logs with warnings when policy is misconfigured #33074 * @gwbrown
- Rolling Upgrade Tests @talevy add ILM rolling upgrade tests #32828 **
- Rename "after" field to
minimnum_age
@dakrone Rename "after" field #32624 * - Issues labelled
:Core/Features/ILM
andblocker
- Manual Testing @dakrone
Optional (but would be really good to have)
- ILM usage stats (using xpack usage API) @colings86 Adds usage data for ILM #33377 **