Description
We do the following when snapshotting a shard in https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L1177:
public void snapshot(final IndexCommit snapshotIndexCommit) {
logger.debug("[{}] [{}] snapshot to [{}] ...", shardId, snapshotId, metadata.name());
final Map<String, BlobMetaData> blobs;
try {
blobs = blobContainer.listBlobs();
to find the latest index-{N}
file at the root of each shard folder.
There is no guarantee that this file is actually going to be listed if two snapshots happen in rapid succession and some inconsistency becomes ever more likely the more snapshots one has.
If we hit the wrong N
here the subsequent logic of:
if (existingFileInfo == null) {
indexIncrementalFileCount++;
indexIncrementalSize += md.length();
// create a new FileInfo
BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo =
new BlobStoreIndexShardSnapshot.FileInfo(fileNameFromGeneration(++generation), md, chunkSize());
indexCommitPointFiles.add(snapshotFileInfo);
filesToSnapshot.add(snapshotFileInfo);
} else {
indexCommitPointFiles.add(existingFileInfo);
}
}
snapshotStatus.moveToStarted(startTime, indexIncrementalFileCount,
indexTotalNumberOfFiles, indexIncrementalSize, indexTotalFileCount);
for (BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo : filesToSnapshot) {
try {
snapshotFile(snapshotFileInfo);
Could produce incorrect values for generation
causing shard data files to collide in naming.
Add to that the fact that the S3 repository has no failIfAlreadyExists
logic in place, like the other snapshot repositories (see #36927 for details), one could overwrite shard data files in this scenario and corrupt the repository as far as I can see.
@ywelsch @tlrx maybe I'm missing some hidden step here that prevents getting the wrong N
and potentially overwriting shard data files?