Skip to content

Defer repo ops in searchable snapshot restore #53961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

A searchable snapshots Directory requires a BlobContainer and a
BlobStoreIndexShardSnapshot, which require us to read some data from the
repository. Today we construct these objects on the cluster applier thread,
blocking that thread on remote operations.

This commit defers their construction until the restore process starts, so that
they can happen on a more appropriate thread. It also reinstates the assertion
that snapshot/restore operations are all on the snapshot or generic threadpool,
but weakens it to allow them to occur on the search threadpool too.

A searchable snapshots `Directory` requires a `BlobContainer` and a
`BlobStoreIndexShardSnapshot`, which require us to read some data from the
repository. Today we construct these objects on the cluster applier thread,
blocking that thread on remote operations.

This commit defers their construction until the restore process starts, so that
they can happen on a more appropriate thread. It also reinstates the assertion
that snapshot/restore operations are all on the snapshot or generic threadpool,
but weakens it to allow them to occur on the search threadpool too.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 23, 2020
@DaveCTurner DaveCTurner requested a review from tlrx March 23, 2020 09:53
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks David

@DaveCTurner
Copy link
Contributor Author

When I was working on this last week the assertion added in this PR was failing due to a call to ByteSizeCachingDirectory#estimateSizeInBytes somewhere on an applier thread. I was called onto other things before I could investigate and now that's not happening. Possibly this change was incomplete, or possibly something has changed in the meantime.

protected void assertUsingPermittedThreadPool() {
assert Thread.currentThread().getName().contains('[' + ThreadPool.Names.SNAPSHOT + ']')
|| Thread.currentThread().getName().contains('[' + ThreadPool.Names.GENERIC + ']')
|| Thread.currentThread().getName().contains('[' + ThreadPool.Names.SEARCH + ']')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think this works, we should be seeing this fail due to access on the search_throttled threadpool and also (TBC) the can-match phase runs elsewhere too.

@DaveCTurner
Copy link
Contributor Author

Closing this in favour of #54211, there have been too many conflicting changes to keep this one alive.

@DaveCTurner DaveCTurner deleted the 2020-03-17-reinstate-threadpool-assertion branch March 26, 2020 09:28
tlrx added a commit that referenced this pull request Apr 6, 2020
…covery hook (#54729)

In #53961 we defer the construction of Searchable Snapshot
Directory's BlobContainer and a BlobStoreIndexShardSnapshot
instances so that these objects are created when they are first
accessed, which we expect to be during the recovery process. At
the same time, assertions were added to ensure that the
construction is effectively executed under a generic or snapshot
thread.

Sadly, this is not always the case because there is always a
short window of time between the creation of the IndexShard and
the time the objects are created during the recovery process. It
is also possible that other components of Elasticsearch trigger
the creation of the blob container and snapshot.

We identified the following places:

- computing avg shard size of index shards in
    IndexService.createShard() (this is triggered when relocating
    a primary shard under the cluster state applier thread)
- computing indices stats in TransportIndicesStatsAction which
    calls indexShard.storeStats() which calls
    estimateSizeInBytes() (this is triggered by
    InternalClusterInfoService under the management thread pool)
- computing shard store metadata in
    IndexShard.snapshotStoreMetadata while calls
    failIfCorrupted(Directory) (this is triggered by
    TransportNodesListShardStoreMetadata, executed under the
    fetch_shard_store thread pool)
- TransportNodesListGatewayStartedShards should also use
    failIfCorrupted(Directory) at some point (triggered by the
    GatewayAllocator and executed under the fetch_shard_started
    thread pool)

This commit changes the way BlobContainer and a
BlobStoreIndexShardSnapshot instances are created so that it does
not happen on first access anymore but the objects are now
created using a specific loadSnapshot() method.

This method is explicitly called during the pre-recovery phase.

Until this method is called, the SearchableSnapshotDirectory acts
as if it was empty: the listAll() method returns an empty
array. Having this behavior allows the identified access points
to not fail and not trigger the snapshot loading before we
explicitly load it in the pre-recovery hook.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants