Rework SLM to avoid concurrent snapshots by logging-and-skipping

Related to some of the discussion that came out of https://github.com/elastic/elasticsearch/issues/64035.

Imagine, for example, the case of a cluster that has an SLM policy configured to take snapshots every 30 minutes, but where those snapshots actually take 35 minutes to perform. Currently, this case will result in overlapping snapshots being scheduled (less of an issue now that snapshots can be executed in parallel, see #56911).

However, it might be beneficial to log-and-skip in those cases rather than scheduling the overlapping snapshot. In essence, then, this would change an SLM policy from a `scheduleSnapshot` to a `maybeScheduleSnapshot`. In the 30/35 minutes illustration, we'd end up snapshotting once an hour rather than once every 30 minutes, logging-and-skipping every other snapshot. That is, in the face of snapshot over-scheduling, we degrade gracefully to a less frequent schedule.

/cc @original-brownbear 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework SLM to avoid concurrent snapshots by logging-and-skipping #65318

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rework SLM to avoid concurrent snapshots by logging-and-skipping #65318

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions