Skip to content

Add Package Level Javadoc for Snapshot Clones #63217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,8 @@ private static Map<String, IndexId> getInFlightIndexIds(List<SnapshotsInProgress
.collect(Collectors.toMap(IndexId::getName, Function.identity()));
}

// TODO: It is worth revisiting the design choice of creating a placeholder entry in snapshots-in-progress here once we have a cache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought I had while writing this up. The placeholder logic might be questionable but we can easily remove it in a BwC manner like we did for snapshot create if want to in a follow-up so I'm not too worried here either.

// for repository metadata and loading it has predictable performance
public void cloneSnapshot(CloneSnapshotRequest request, ActionListener<Void> listener) {
final String repositoryName = request.repository();
Repository repository = repositoriesService.repository(repositoryName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,39 @@
* </ol>
*
* <h2>Cloning a Snapshot</h2>
* TODO: write up the steps in a snapshot clone properly
*
* <p>Cloning part of a snapshot is a process executed entirely on the master node. On a high level, the process of cloning a snapshot is
* analogous to that of creating a snapshot from data in the cluster except that the source of data files is the snapshot repository
* instead of the data nodes. It begins with cloning all shards and then finalizes the cloned snapshot the same way a normal snapshot would
* be finalized. Concretely, it is executed as follows:</p>
*
* <ol>
* <li>First, {@link org.elasticsearch.snapshots.SnapshotsService#cloneSnapshot} is invoked which will place a placeholder entry into
* {@code SnapshotsInProgress} that does not yet contain any shard clone assignments. Note that unlike in the case of snapshot
* creation, the shard level clone tasks in {@link org.elasticsearch.cluster.SnapshotsInProgress.Entry#clones} are not created in the
* initial cluster state update as is done for shard snapshot assignments in
* {@link org.elasticsearch.cluster.SnapshotsInProgress.Entry#shards}. This is due to the fact that shard snapshot assignments are
* computed purely from information in the current cluster state while shard clone assignments require information to be read from the
* repository, which is too slow of a process to be done inside a cluster state update. Loading this information ahead of creating a
* task in the cluster state, runs the risk of race conditions where the source snapshot is being deleted before the clone task is
* enqueued in the cluster state.</li>
* <li>Once a placeholder task for the clone operation is put into the cluster state, we must determine the number of shards in each
* index that is to be cloned as well as ensure the health of the index snapshots in the source snapshot. In order to determine the
* shard count for each index that is to be cloned, we load the index metadata for each such index using the repository's
* {@link org.elasticsearch.repositories.Repository#getSnapshotIndexMetaData} method. In order to ensure the health of the source index
* snapshots, we load the {@link org.elasticsearch.snapshots.SnapshotInfo} for the source snapshot and check for shard snapshot
* failures of the relevant indices.</li>
* <li>Once all shard counts are known and the health of all source indices data has been verified, we populate the
* {@code SnapshotsInProgress.Entry#clones} map for the clone operation with the the relevant shard clone tasks.</li>
* <li>After the clone tasks have been added to the {@code SnapshotsInProgress.Entry}, master executes them on its snapshot thread-pool
* by invoking {@link org.elasticsearch.repositories.Repository#cloneShardSnapshot} for each shard that is to be cloned. Each completed
* shard snapshot triggers a call to the {@link org.elasticsearch.snapshots.SnapshotsService#SHARD_STATE_EXECUTOR} which updates the
* clone's {@code SnapshotsInProgress.Entry} to mark the shard clone operation completed.</li>
* <li>Once all the entries in {@code SnapshotsInProgress.Entry#clones} have completed, the clone is finalized just like any other
* snapshot through {@link org.elasticsearch.snapshots.SnapshotsService#endSnapshot}. The only difference being that the metadata that
* is written out for indices and the global metadata are read from the source snapshot in the repository instead of the cluster state.
* </li>
* </ol>
*
* <h2>Concurrent Snapshot Operations</h2>
*
Expand Down