Defer repo ops in searchable snapshot restore #53961

DaveCTurner · 2020-03-23T09:53:16Z

A searchable snapshots Directory requires a BlobContainer and a
BlobStoreIndexShardSnapshot, which require us to read some data from the
repository. Today we construct these objects on the cluster applier thread,
blocking that thread on remote operations.

This commit defers their construction until the restore process starts, so that
they can happen on a more appropriate thread. It also reinstates the assertion
that snapshot/restore operations are all on the snapshot or generic threadpool,
but weakens it to allow them to occur on the search threadpool too.

A searchable snapshots `Directory` requires a `BlobContainer` and a `BlobStoreIndexShardSnapshot`, which require us to read some data from the repository. Today we construct these objects on the cluster applier thread, blocking that thread on remote operations. This commit defers their construction until the restore process starts, so that they can happen on a more appropriate thread. It also reinstates the assertion that snapshot/restore operations are all on the snapshot or generic threadpool, but weakens it to allow them to occur on the search threadpool too.

elasticmachine · 2020-03-23T09:53:17Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx

LGTM, thanks David

DaveCTurner · 2020-03-23T10:03:09Z

When I was working on this last week the assertion added in this PR was failing due to a call to ByteSizeCachingDirectory#estimateSizeInBytes somewhere on an applier thread. I was called onto other things before I could investigate and now that's not happening. Possibly this change was incomplete, or possibly something has changed in the meantime.

DaveCTurner · 2020-03-23T13:56:08Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    protected void assertUsingPermittedThreadPool() {
+        assert Thread.currentThread().getName().contains('[' + ThreadPool.Names.SNAPSHOT + ']')
+            || Thread.currentThread().getName().contains('[' + ThreadPool.Names.GENERIC + ']')
+            || Thread.currentThread().getName().contains('[' + ThreadPool.Names.SEARCH + ']')


Hmm I don't think this works, we should be seeing this fail due to access on the search_throttled threadpool and also (TBC) the can-match phase runs elsewhere too.

DaveCTurner · 2020-03-26T09:27:54Z

Closing this in favour of #54211, there have been too many conflicting changes to keep this one alive.

…covery hook (#54729) In #53961 we defer the construction of Searchable Snapshot Directory's BlobContainer and a BlobStoreIndexShardSnapshot instances so that these objects are created when they are first accessed, which we expect to be during the recovery process. At the same time, assertions were added to ensure that the construction is effectively executed under a generic or snapshot thread. Sadly, this is not always the case because there is always a short window of time between the creation of the IndexShard and the time the objects are created during the recovery process. It is also possible that other components of Elasticsearch trigger the creation of the blob container and snapshot. We identified the following places: - computing avg shard size of index shards in IndexService.createShard() (this is triggered when relocating a primary shard under the cluster state applier thread) - computing indices stats in TransportIndicesStatsAction which calls indexShard.storeStats() which calls estimateSizeInBytes() (this is triggered by InternalClusterInfoService under the management thread pool) - computing shard store metadata in IndexShard.snapshotStoreMetadata while calls failIfCorrupted(Directory) (this is triggered by TransportNodesListShardStoreMetadata, executed under the fetch_shard_store thread pool) - TransportNodesListGatewayStartedShards should also use failIfCorrupted(Directory) at some point (triggered by the GatewayAllocator and executed under the fetch_shard_started thread pool) This commit changes the way BlobContainer and a BlobStoreIndexShardSnapshot instances are created so that it does not happen on first access anymore but the objects are now created using a specific loadSnapshot() method. This method is explicitly called during the pre-recovery phase. Until this method is called, the SearchableSnapshotDirectory acts as if it was empty: the listAll() method returns an empty array. Having this behavior allows the identified access points to not fail and not trigger the snapshot loading before we explicitly load it in the pre-recovery hook.

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 23, 2020

DaveCTurner requested a review from tlrx March 23, 2020 09:53

tlrx approved these changes Mar 23, 2020

View reviewed changes

Imports

95d9127

DaveCTurner commented Mar 23, 2020

View reviewed changes

DaveCTurner closed this Mar 26, 2020

DaveCTurner deleted the 2020-03-17-reinstate-threadpool-assertion branch March 26, 2020 09:28

tlrx mentioned this pull request Apr 3, 2020

Trigger explicit loading of blob container and snapshot in the pre-recovery hook #54729

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Defer repo ops in searchable snapshot restore #53961

Defer repo ops in searchable snapshot restore #53961

Uh oh!

DaveCTurner commented Mar 23, 2020

Uh oh!

elasticmachine commented Mar 23, 2020

Uh oh!

tlrx left a comment

Uh oh!

DaveCTurner commented Mar 23, 2020

Uh oh!

DaveCTurner Mar 23, 2020

Uh oh!

DaveCTurner commented Mar 26, 2020

Uh oh!

Uh oh!

Defer repo ops in searchable snapshot restore #53961

Defer repo ops in searchable snapshot restore #53961

Uh oh!

Conversation

DaveCTurner commented Mar 23, 2020

Uh oh!

elasticmachine commented Mar 23, 2020

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 23, 2020

Uh oh!

DaveCTurner Mar 23, 2020

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 26, 2020

Uh oh!

Uh oh!