Skip to content

Add searchable snapshots cache directory #50693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 7, 2020

Note: this pull request targets the feature/searchable-snapshots branch

This pull request adds a new CacheDirectory implementation which allows to cache ranges of bytes of Lucene files. This directory implementation uses a CacheService to retrieve and to read ranges of bytes from cached files stored on disk. The cached files are created as sparse files on disk and the availability of bytes ranges are tracked through a SparseFileTracker instance (thanks David!).

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 7, 2020
@tlrx tlrx requested a review from DaveCTurner January 7, 2020 08:26
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Member Author

tlrx commented Jan 7, 2020

@elasticmachine run elasticsearch-ci/1

(Unrelated test failure)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first pass and left some comments and questions.

Setting.Property.NodeScope);

public static final Setting<Boolean> SNAPSHOT_CACHE_INVALIDATE_ON_SHUTDOWN =
Setting.boolSetting("searchable.snapshot.cache.invalidate_on_shutdown", true, Setting.Property.NodeScope, Setting.Property.Dynamic);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this and always invalidate the cache on shutdown? I see that this might be useful for some early experiments, but I think if we want to assert things about the content of the cache we can do so in an ESIntegTestCase while the node(s) are running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I removed the setting and it now always invalidate on shutdown.

this.evicted = false;
}

private long estimateWeight() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with Cache but it looks like the weight is not expected to change during the lifetime of an entry which isn't the case with tracker.getLengthOfRanges(). I think for now it might be safer to use the length of the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now it might be safer to use the length of the file.

I agree, I pushed this change

boolean success = false;
try {
logger.trace(() -> new ParameterizedMessage("creating new cache file for [{}] at [{}]", file.getFileName(), path));
final FileChannel channel = FileChannel.open(path, CACHE_FILE_OPEN_OPTIONS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this means every CacheEntry corresponds with an open file, even if the corresponding IndexInput is closed. I think this might cause problems at scale. Could we close the FileChannel whenever the IndexInput is closed?

while (remaining > 0) {
int len = (remaining < copyBufferSize) ? (int) remaining : copyBufferSize;
logger.trace(() -> new ParameterizedMessage("reading {} bytes from [{}]", len, input));
input.readBytes(copyBuffer, 0, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work ok with non-filesystem blob stores? Ideally we'd send a single 32MB GET request and then write it locally in 8k chunks, but it's not obvious how we will tell input to prepare for a large sequential read like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is defeating the purpose of requesting large chunks of data from the underlying SearchableSnapshotIndexInput, which is today implemented as a BufferedIndexInput. With the current implementation only chunks of <<BufferedIndexInput's internal buffer size>> are requested (ie 1024).

I think that the right think to do is to have SearchableSnapshotIndexInput extends IndexInput directly and maintains an internal InputStream which is opened from the BlobContainer's readBlob(name, position, len) method where len is 32Mb.

The SearchableSnapshotIndexInput should take care of opening the InputStream from the required position (by using range bytes S3 requests), closing it and reopening it on IndexInput's seekings. It should also take care of draining/aborting the InputStream in case not all bytes were read.

public void readFromCache(final Path filePath, final long fileLength, final CheckedSupplier<IndexInput, IOException> fileSource,
final long position, final byte[] b, int off, int len) throws IOException {
long pos = position;
while (len > 0 && lifecycleState() == Lifecycle.State.STARTED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might exit this loop with len > 0; if we do so I think we have to throw an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in this change

while (len > 0 && lifecycleState() == Lifecycle.State.STARTED) {
final CacheEntry cacheEntry = getOrAddToCache(filePath, fileLength, fileSource);
if (cacheEntry.tryIncRef() == false) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about thrashing here. We're trying to handle the case where the entry is evicted between getOrAddToCache and tryIncRef, but if we have multiple threads all adding entries that get immediately evicted (might happen if the cache is too small and/or almost full) I think we could end up stuck here. Maybe it'd be better to give up on the cache in this case and read directly from the fileSource?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and this is a good suggestion.

I changed the test so that it randomly uses a smaller cache and changed the readFromCache() method so that it serves the read operation directly from the source if the picked up cache file is already evicted. It does a similar thing later in the process, if the cache file has been evicted just before writing the new range on disk.

@tlrx
Copy link
Member Author

tlrx commented Jan 14, 2020

@DaveCTurner Thanks for your feedback. As said via another channel, I addressed the concerning issue about opened file handles by extracting the previous CacheEntry inner class into a CacheFile (no better name) outer class that implements RefCounted and opens/closes the file channel when needed. When a non-evicted CacheFile is used to read data, it is returned to the CacheBufferedIndexInput as a Releasable. It is then the responsibility of the IndexInput to notify that it is done with the last cache entry by releasing it when the input is closed.

Sadly, there is a file leak reported by the test framework and I haven't found the culprit yet:

``` java.lang.RuntimeException: file handle leaks: [FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd3/d0/indices/bRsLBbR5Q1-6TK2P2eDbtA/0/snapshots/2USI9vDEQau9K0LQ3abr7g/MF2_TsUVRxq1vcMHB_Mn5w), FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd4/d0/indices/WoK6PayuQL6_V9Q4yvWVzA/0/snapshots/v6rc2UOlSyOHq9TaCnUWcw/LA7wlmXZT0y4Rowdb3LplA), FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd3/d0/indices/hn_aBXC2TLaZnWHRP4hadg/0/snapshots/G7k5qu0UQYWd3__tc9e7AQ/OezzZN78SXe-g6tH064zkw)] at __randomizedtesting.SeedInfo.seed([5C32FE860A9D9738]:0) at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63) at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77) at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78) at org.apache.lucene.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:228) at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.Exception at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46) at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81) at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:197) at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:166) at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292) at org.elasticsearch.xpack.searchablesnapshots.cache.CacheFile.tryIncRef(CacheFile.java:78) at org.elasticsearch.xpack.searchablesnapshots.cache.CacheService.readFromCache(CacheService.java:151) at org.elasticsearch.xpack.searchablesnapshots.cache.CacheDirectory$CacheBufferedIndexInput.readInternal(CacheDirectory.java:91) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54) at org.apache.lucene.store.DataInput.readInt(DataInput.java:102) at org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:183) at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194) at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.(Lucene50CompoundReader.java:81) at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:70) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:101) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:69) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:61) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:640) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:84) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:137) at org.elasticsearch.index.engine.ReadOnlyEngine.open(ReadOnlyEngine.java:176) at org.elasticsearch.index.engine.ReadOnlyEngine.(ReadOnlyEngine.java:118) at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots.lambda$getEngineFactory$0(SearchableSnapshots.java:99) at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1592) at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1558) at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$7(StoreRecovery.java:464) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:89) at org.elasticsearch.repositories.blobstore.FileRestoreContext.lambda$restore$0(FileRestoreContext.java:159) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:146) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) at org.elasticsearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:66) at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:89) at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:46) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:688) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ... 1 more ```

any help, comment and feedback from you will be greatly appreciated.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I think there's a subtle issue with thread-safety in CacheBufferedIndexInput - see comments below.

this.ioContext = ioContext;
this.offset = offset;
this.end = offset + length;
this.clones = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we clone and/or slice from different threads so we should make this threadsafe:

Suggested change
this.clones = new ArrayList<>();
this.clones = new CopyOnWriteArrayList<>();

@Override
public CacheBufferedIndexInput clone() {
final CacheBufferedIndexInput clone = (CacheBufferedIndexInput) super.clone();
clone.clones = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we clone and/or slice from different threads so we should make this threadsafe:

Suggested change
clone.clones = new ArrayList<>();
clone.clones = new CopyOnWriteArrayList <>();


@Override
public void close() {
if (closed == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use an AtomicBoolean for thread-safety, and then

Suggested change
if (closed == false) {
if (closed.compareAndSet(false, true)) {

Probably wise to assert closed.get() == false in the other methods too.

public void close() {
if (closed == false) {
closed = true;
if (releasable != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid this null check:

Suggested change
if (releasable != null) {
Releasables.close(releasable);

if (releasable != null) {
Releasables.close(releasable);
}
if (clones != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is always true.

Releasables.close(releasable);
}
if (clones != null) {
for (IndexInput clone : clones) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe IOUtils.close(clones)?

final CacheBufferedIndexInput clone = (CacheBufferedIndexInput) super.clone();
clone.clones = new ArrayList<>();
clone.releasable = null;
clones.add(clone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that if we keep an IndexInput open for an extended period of time then we will accumulate a lot of clones here, all of which have been closed (or maybe just discarded). I think that e.g. with a compound segment we might keep it open for the life of the shard, and add at least one clone or slice for each search.

@DaveCTurner DaveCTurner mentioned this pull request Jan 14, 2020
19 tasks
@tlrx
Copy link
Member Author

tlrx commented Jan 20, 2020

@elasticmachine update branch

@tlrx
Copy link
Member Author

tlrx commented Jan 20, 2020

@elasticmachine update branch

@tlrx
Copy link
Member Author

tlrx commented Jan 20, 2020

@DaveCTurner I updated this branch according to our latest discussions. This is not yet perfect, but I think this is much more closer to how we think this cache system should be. I'm running more tests on my side, in the meantime that would be great if you could have a look to the changes.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I will need to think about this more carefully tomorrow as the interlocking ref counting is pretty subtle, but I have done a superficial scan and think the structure looks about right.

CacheFile cacheFile = null;
try {
cacheFile = getOrAcquire();
final CacheFile.FileChannelRefCounted channelRef = cacheFile.getChannelRefCounter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheFile may be null here if there's a concurrent eviction I think, but in that case we don't read directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this.cacheFile may be null and local variable not assigned as an exception should be thrown. I added an assertion to check non-nullity of the local cacheFile.

refCounter.decRef();
}
}
if (evictionListeners != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: always true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is a left over.

try {
ensureOpen();
final Set<EvictionListener> newListeners = new HashSet<>(listeners);
if (newListeners.add(Objects.requireNonNull(listener)) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert this (and also that listener != null)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
if (evictionListeners != null) {
evictionListeners.forEach(listener -> listener.onEviction(this));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert here that listeners.isEmpty() and that refCounter.refCount() == 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can? The close() method does not clean up the list of listeners but instead captures the registered listeners at closing time (end set the closed flag so that no new listeners can be registered). This is the duty of each listener to release and remove itself once it is notified of the eviction (and once the listener is done with any cache file resource)

channelRefCounter.deleteAfterClose();
} else {
try {
Files.deleteIfExists(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a test failure here due to WindowsFS rejecting deletes of files that are still open. I'm not quite sure why yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raaaah. I ran the tests quite many times over the weekend and didn't get such failure. I'll try to reproduce and let you know my findings.

@tlrx tlrx requested a review from DaveCTurner January 24, 2020 13:02
@tlrx
Copy link
Member Author

tlrx commented Jan 24, 2020

@DaveCTurner As said via another channel, this is ready for your review. I'm still running SearchableSnapshotsIntegTests in a loop and got 500+ tests execution passed successfully so far.

@tlrx
Copy link
Member Author

tlrx commented Jan 25, 2020

@elasticmachine update branch

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left only minor comments.

I think there might be value in extracting the parts of CacheBufferedIndexInput that only apply with parent == null since they're kind of a different concept from the actual IndexInput.

I have studied the locking in detail and see that we always obtain locks in the order CacheService -> CacheBufferedIndexInput -> CacheFile, so the deadlock situation looks good.


private @Nullable AtomicReference<CacheFile> cacheFile;
private @Nullable CacheBufferedIndexInput parent;
private ReleasableLock cacheEvictionLock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need a special kind of lock here any more, synchronized (this) should be ok here?

Copy link
Member Author

@tlrx tlrx Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, synchronized blocks should be enough. I pushed 0aece73


private volatile Set<EvictionListener> listeners;
private volatile FileChannel channel;
private volatile boolean closed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this to evicted now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I pushed 48fa923

'}';
}

private int copySource(long start, long end, byte[] buffer, int offset) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming to something like readDirectly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion, thanks. I pushed 391633f

@SuppressForbidden(reason = "Use positional writes on purpose")
void writeCacheFile(FileChannel fc, long start, long end) throws IOException {
assert assertFileChannelOpen(fc);
final byte[] copyBuffer = new byte[Math.toIntExact(Math.min(8192L, end - start))];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should extract 8192L as a named constant (used here and in copySource).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added COPY_BUFFER_SIZE in 99a9968

}

@Override
public IndexInput slice(String sliceDescription, long offset, long length) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this doesn't throw IOException.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right -> 348fac8


// Prevent new CacheFile objects to be added to the cache while the cache is being fully or partially invalidated
// This can happen because CacheFile objects might execute listeners at eviction time, potentially forcing more
// objects (like CacheDirectory's index inputs) to get new CacheFile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this. We invalidate the cache when stopping and expected that in that case any attempts to get things from the cache would result in exceptions. Can you clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I'm confused too 😕 I remember I added this because of some kind of deadlock in integration tests (not in manual tests, though) where the full invalidation ran concurrently with some file accesses in Lucene directory (executed by nodes list shard stores actions). The invalidation blocked the cache but the file accesses tried to get/compute new entries in cache, resulting in deadlocks. I've ran more tests without these locks and everything is fine so I now think that the deadlocks were just a manifestation of some previous buggy code.

void onEviction(CacheFile evictedCacheFile);
}

static final int RANGE_SIZE = 1 << 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run tests with a smaller (and non-power-of-two) here just to be certain we have good coverage, and a larger range size in production. Can we make this something that gets passed in?

(I thought I'd already made this comment but it seems to have been lost, sorry for the duplicate if so)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd already made this comment but it seems to have been lost,

Yes you did and no not lost :) I replied in #50693 (comment) but basically I agree with you and I think it should be a another setting (we might want to play with cache range size in tests). It requires a bit of plumbing in tests so I proposed to do it in a follow up PR (but I can do it in this one if you prefer).

@tlrx
Copy link
Member Author

tlrx commented Jan 27, 2020

Thanks @DaveCTurner. I've addressed and/or replied to your comments.

I think there might be value in extracting the parts of CacheBufferedIndexInput that only apply with parent == null since they're kind of a different concept from the actual IndexInput.

That sounds like a good suggestion but the part that only apply with parent == null is getOrAcquire() which is tightly coupled with how the root IndexInput's cacheFile instance is updated so I don't think we should extract this. Unless you were thinking of something different?

@DaveCTurner
Copy link
Contributor

Perhaps it's clearer to show what I mean by the extraction:

tlrx/elasticsearch@add-searchable-snapshots-cache-directory...DaveCTurner:2020-01-27-cachefilereference-wip

WDYT?

@tlrx
Copy link
Member Author

tlrx commented Jan 27, 2020

@DaveCTurner Thanks, I better understand what you meant. It's indeed clearer with the CacheFileReference. If that's ok for you I borrowed your suggestion, included the comments you added that I find valuable. Let me know if we can merge this now or if you have more comments, thanks!

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work @tlrx

@tlrx tlrx merged commit 714c480 into elastic:feature/searchable-snapshots Jan 27, 2020
@tlrx
Copy link
Member Author

tlrx commented Jan 27, 2020

Thanks David!

tlrx added a commit that referenced this pull request Jan 28, 2020
This commit changes the current cache range size from 32kb to 32mb 
and makes it configurable through a globally defined xpack.searchable.
snapshot.cache.range_size setting. This setting is set to low values in 
unit tests (see #50693) but a bit higher values in integration tests so 
that they don't take too much time to complete.
tlrx added a commit that referenced this pull request Jan 31, 2020
Today cache files are identified in cache using a string representing 
an absolute path to a file on disk. This path is a sub directory of the 
current shard data path and as such already contains identification 
bits like the current index id and the shard id. It also contains the 
snapshot id that is passed at CacheDirectory creation time.

While this has been done for quick prototyping and already been 
improved in #51520, it feels wrong to rely on a path converted to
a string as cache keys. Instead we should have a distinct CacheKey 
object to identify CacheFile in cache.

Relates #50693
tlrx added a commit that referenced this pull request Jan 31, 2020
This commit is a first step to add instrumentation to the CacheDirectory 
added in #50693. It adds a new mutable IndexInputStats object that allows 
to track various information about how CacheBufferedIndexInput interacts 
with the underlying cache file to satisfy the read operations. It keep tracks 
of small/large forward/backward seekings as well as the total number of 
bytes read from the IndexInput and the total number of bytes read/written 
from/to the CacheFile.

Note that the stats do not reflect the exact usage that Lucene does of the 
IndexInputs opened through a CacheDirectory: IndexInputStats is not aware 
of the read operations that are directly served at a higher level by the 
internal BufferedIndexInput's buffer. Instead it tracks what really hit the disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants