Skip to content

Introduce a caching mechanism for files in Searchable Snapshot Directory #49934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Dec 6, 2019

Note: this draft pull request targets the feature/searchable-snapshots branch

This pull request introduces a simple caching mechanism that operates at the Lucene files level of searchable snapshot directories.

Several new classes are introduced or changed since #49651: the searchable snapshot directory (SearchableSnapshotDirectory) now contains a representation of the snapshotted shard files (SearchableSnapshotShard) which allows to list the files or read a file from a specific snapshot.

A basic implementation of a searchable snapshot shard is BlobStoreSearchableSnapshotShard which directly accesses a remote blob store repository to list or to read files. This implementation takes care of converting the names of Lucene files into blob names in the repository and to load the appropriate chunks of blobs (the implementation is still very raw and error prone and must be consolidate).

Another implementation of a searchable snapshot shard is CachedSearchableSnapshotShard which
caches segment (or portion) of file using a CacheService. This cache service uses the existing LRU org.elasticsearch.common.cache.Cache to cache file segments in memory. This cache is also very raw and should evolve to something more complex that caches segment of files on disk. The CachedSearchableSnapshotShard acts as a FilterSearchableSnapshotShard so that it delegates the listing or the reading of files to another searchable snapshot shard in case of the segment of file to read is not present in cache (ie, a cache miss). When the segment of file to read requested by the searchable snapshot directory's index input is present in cache it is served directly.

Finally, this pull request reuses the tests added in #49651 to test the searchable snapshot directory implementation by randomly use the cache or not.

@tlrx tlrx added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Dec 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Member Author

tlrx commented Jan 27, 2020

The cache system has been implemented in #50693

@tlrx tlrx closed this Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants