Skip to content

Can we avoid force-merging all shard copies? #75478

Open
@jpountz

Description

@jpountz

@ywelsch mentioned to me that one could easily run a force-merge on a read-only index on a single copy by doing the following sequence of operations:

  • call the index clone API and set the number of replicas to 0 on the clone
  • force-merge the clone
  • set the number of replicas to the original value on the clone
  • flip the alias so that the clone takes the place of the original index in the data stream
  • delete the original index

This would only require half the CPU compared to running a plain force-merge, which is a great deal. Should we move the force-merge ILM action to this sequence of operations instead of just calling the _force_merge API?

Interestingly it wouldn't require more temporary storage. Since the index clone API uses symlinks, it doesn't need additional storage initially, so we would only need 2x the size of a shard of temporary storage right after increasing the number of replicas of the clone to 1. This is the same as running the _force_merge API since merging needs the same amount of temporary storage as the size of a shard copy, and this is needed of both shard copies.

For the searchable-snapshots ILM action, which optionally performs a forced merge, we could do something better by adding an option to _force_merge to only merge primaries, and using it with the searchable-snapshots action. This would work since only primaries are used to take snapshots. And this would only require half the temporary storage that it needs today.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions