Skip to content

[BUG] [Snapshot Interop] Optimize batch async blob cleanup during snapshot deletion for remote store enabled indices. #12302

Closed
@harishbhakuni

Description

Describe the bug

Currently, During snapshot deletion we asynchronously try to cleanup shard blobs by creating batches of 1000 blobs at a time. If the index is remote store enabled, we also release lock for each shard blob followed by remote store cleanup if index is already deleted from the cluster. If either release lock or remote store cleanup fails even for one shard, we end up skipping the cleanup of the entire batch.

RemoteStoreLockManager remoteStoreMetadataLockManager = remoteStoreLockManagerFactory.newLockManager(
remoteStoreRepoForIndex,
indexUUID,
shardId
);
remoteStoreMetadataLockManager.release(
FileLockInfo.getLockInfoBuilder().withAcquirerId(snapshotUUID).build()
);
if (!isIndexPresent(clusterService, indexUUID)) {
// this is a temporary solution where snapshot deletion triggers remote store side
// cleanup if index is already deleted. We will add a poller in future to take
// care of remote store side cleanup.
// see https://github.com/opensearch-project/OpenSearch/issues/8469
new RemoteSegmentStoreDirectoryFactory(
remoteStoreLockManagerFactory.getRepositoriesService(),
threadPool
).newDirectory(
remoteStoreRepoForIndex,
indexUUID,
new ShardId(Index.UNKNOWN_INDEX_NAME, indexUUID, Integer.valueOf(shardId))
).close();

Due to this, we end up calling release locks for the entire batch in the next run again. this can be optimized by skipping shard blob cleanup for only those shards for which release lock or remote store cleanup failed.

Related component

Storage:Snapshots

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

During batch shard blob deletion, in cases of release lock or remote store cleanup failures, we should only skip deletion of shard blobs with failures.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions