Skip to content

SourceOnlySnapshotRepository is Leaking Files #50231

Open
@original-brownbear

Description

@original-brownbear

When using SourceOnlySnapshotRepository a _snapshot directory is created in each snapshotted shard's data path. The contents of this path directory are never cleaned up it seems.

I tried fixing this via a trivial fix:

diff --git a/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java b/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
index af9d95a519a..5b46d786f7f 100644
--- a/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
+++ b/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
@@ -134,10 +134,11 @@ public final class SourceOnlySnapshotRepository extends FilterRepository {
         Path dataPath = ((FSDirectory) unwrap).getDirectory().getParent();
         // TODO should we have a snapshot tmp directory per shard that is maintained by the system?
         Path snapPath = dataPath.resolve(SNAPSHOT_DIR_NAME);
-        final List<Closeable> toClose = new ArrayList<>(3);
+        final List<Closeable> toClose = new ArrayList<>(4);
         try {
             FSDirectory directory = new SimpleFSDirectory(snapPath);
             toClose.add(directory);
+            toClose.add(() -> IOUtils.rm(snapPath));
             Store tempStore = new Store(store.shardId(), store.indexSettings(), directory, new ShardLock(store.shardId()) {
                 @Override
                 protected void closeInternal() {

but this seems to severely alter the incrementality properties of the source only snapshotting (tests fails because the expected file counts change).
This is kind of obvious from the way it works, but shouldn't we at least delete these _snapshot folders when removing a source only repository? Also, shouldn't we document the fact that using a source only repository may require significant disk space to track the source_only on disk index used to make things incremental?

Originally reported in https://discuss.elastic.co/t/large-snapshot-directories-on-disk/211988

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions