-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Defer repo ops in searchable snapshot restore #53961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer repo ops in searchable snapshot restore #53961
Conversation
A searchable snapshots `Directory` requires a `BlobContainer` and a `BlobStoreIndexShardSnapshot`, which require us to read some data from the repository. Today we construct these objects on the cluster applier thread, blocking that thread on remote operations. This commit defers their construction until the restore process starts, so that they can happen on a more appropriate thread. It also reinstates the assertion that snapshot/restore operations are all on the snapshot or generic threadpool, but weakens it to allow them to occur on the search threadpool too.
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks David
When I was working on this last week the assertion added in this PR was failing due to a call to |
protected void assertUsingPermittedThreadPool() { | ||
assert Thread.currentThread().getName().contains('[' + ThreadPool.Names.SNAPSHOT + ']') | ||
|| Thread.currentThread().getName().contains('[' + ThreadPool.Names.GENERIC + ']') | ||
|| Thread.currentThread().getName().contains('[' + ThreadPool.Names.SEARCH + ']') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think this works, we should be seeing this fail due to access on the search_throttled
threadpool and also (TBC) the can-match phase runs elsewhere too.
Closing this in favour of #54211, there have been too many conflicting changes to keep this one alive. |
…covery hook (#54729) In #53961 we defer the construction of Searchable Snapshot Directory's BlobContainer and a BlobStoreIndexShardSnapshot instances so that these objects are created when they are first accessed, which we expect to be during the recovery process. At the same time, assertions were added to ensure that the construction is effectively executed under a generic or snapshot thread. Sadly, this is not always the case because there is always a short window of time between the creation of the IndexShard and the time the objects are created during the recovery process. It is also possible that other components of Elasticsearch trigger the creation of the blob container and snapshot. We identified the following places: - computing avg shard size of index shards in IndexService.createShard() (this is triggered when relocating a primary shard under the cluster state applier thread) - computing indices stats in TransportIndicesStatsAction which calls indexShard.storeStats() which calls estimateSizeInBytes() (this is triggered by InternalClusterInfoService under the management thread pool) - computing shard store metadata in IndexShard.snapshotStoreMetadata while calls failIfCorrupted(Directory) (this is triggered by TransportNodesListShardStoreMetadata, executed under the fetch_shard_store thread pool) - TransportNodesListGatewayStartedShards should also use failIfCorrupted(Directory) at some point (triggered by the GatewayAllocator and executed under the fetch_shard_started thread pool) This commit changes the way BlobContainer and a BlobStoreIndexShardSnapshot instances are created so that it does not happen on first access anymore but the objects are now created using a specific loadSnapshot() method. This method is explicitly called during the pre-recovery phase. Until this method is called, the SearchableSnapshotDirectory acts as if it was empty: the listAll() method returns an empty array. Having this behavior allows the identified access points to not fail and not trigger the snapshot loading before we explicitly load it in the pre-recovery hook.
A searchable snapshots
Directory
requires aBlobContainer
and aBlobStoreIndexShardSnapshot
, which require us to read some data from therepository. Today we construct these objects on the cluster applier thread,
blocking that thread on remote operations.
This commit defers their construction until the restore process starts, so that
they can happen on a more appropriate thread. It also reinstates the assertion
that snapshot/restore operations are all on the snapshot or generic threadpool,
but weakens it to allow them to occur on the search threadpool too.