-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Reduce Heap Use during Shard Snapshot #60370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Heap Use during Shard Snapshot #60370
Conversation
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path). This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected. If shard generations are used, we can safely write these right at the beginning of the shard snapshot. If shard generations are not used we can only write them at the end of the shard snapshot after all other blobs have been written. Closes elastic#60173
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
We wanted to bound the number of concurrent snapshots a data node can take anyway (typically bounded by the max size of the threadpool), so I wonder if that more general change would be sufficient to address this, and not require introduce more complexity in this part of the code. |
It would help as well but this change helps more and even if we bound the number of snapshots would be helpful in reducing memory use on data nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see this as a short-term fix, with the long-term fix being to reduce the level of parallel snapshot operations on a node.
Thanks Yannick
++ I think I have an idea on how to achieve that as well with little changes, will look into it soon. |
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path). This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected. If shard generations are used, we can safely write these right at the beginning of the shard snapshot. If shard generations are not used we can only write them at the end of the shard snapshot after all other blobs have been written. Closes elastic#60173
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path). This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected. If shard generations are used, we can safely write these right at the beginning of the shard snapshot. If shard generations are not used we can only write them at the end of the shard snapshot after all other blobs have been written. Closes #60173
Instances of
BlobStoreIndexShardSnapshots
can be of non-trivial size (because they store some segment files directly as raw bytes). In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path).This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected.
If shard generations are used, we can safely write these right at the beginning of the shard snapshot.
If shard generations are not used we can only write them at the end of the shard snapshot after all
other blobs have been written but with this change at least the initial version of
BlobStoreIndexShardSnapshots
read from the repository goes out of scope right away which should improve this case a little as well.NOTE: in other spots where we are handling collections of these blobs the issue of their size is less relevant because we always map them to something cheaper (like a list of blob names during deletes) right after loading them. This is the only spot where we could have very long lived instances of these objects.
Closes #60173