-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add searchable snapshots cache directory #50693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add searchable snapshots cache directory #50693
Conversation
Co-Authored-By: David Turner <david.turner@elastic.co>
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@elasticmachine run elasticsearch-ci/1 (Unrelated test failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first pass and left some comments and questions.
Setting.Property.NodeScope); | ||
|
||
public static final Setting<Boolean> SNAPSHOT_CACHE_INVALIDATE_ON_SHUTDOWN = | ||
Setting.boolSetting("searchable.snapshot.cache.invalidate_on_shutdown", true, Setting.Property.NodeScope, Setting.Property.Dynamic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this and always invalidate the cache on shutdown? I see that this might be useful for some early experiments, but I think if we want to assert things about the content of the cache we can do so in an ESIntegTestCase
while the node(s) are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I removed the setting and it now always invalidate on shutdown.
this.evicted = false; | ||
} | ||
|
||
private long estimateWeight() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with Cache
but it looks like the weight is not expected to change during the lifetime of an entry which isn't the case with tracker.getLengthOfRanges()
. I think for now it might be safer to use the length of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now it might be safer to use the length of the file.
I agree, I pushed this change
...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java
Outdated
Show resolved
Hide resolved
boolean success = false; | ||
try { | ||
logger.trace(() -> new ParameterizedMessage("creating new cache file for [{}] at [{}]", file.getFileName(), path)); | ||
final FileChannel channel = FileChannel.open(path, CACHE_FILE_OPEN_OPTIONS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this means every CacheEntry
corresponds with an open file, even if the corresponding IndexInput
is closed. I think this might cause problems at scale. Could we close the FileChannel
whenever the IndexInput
is closed?
while (remaining > 0) { | ||
int len = (remaining < copyBufferSize) ? (int) remaining : copyBufferSize; | ||
logger.trace(() -> new ParameterizedMessage("reading {} bytes from [{}]", len, input)); | ||
input.readBytes(copyBuffer, 0, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work ok with non-filesystem blob stores? Ideally we'd send a single 32MB GET request and then write it locally in 8k chunks, but it's not obvious how we will tell input
to prepare for a large sequential read like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is defeating the purpose of requesting large chunks of data from the underlying SearchableSnapshotIndexInput
, which is today implemented as a BufferedIndexInput
. With the current implementation only chunks of <<BufferedIndexInput's internal buffer size>> are requested (ie 1024).
I think that the right think to do is to have SearchableSnapshotIndexInput
extends IndexInput
directly and maintains an internal InputStream
which is opened from the BlobContainer
's readBlob(name, position, len)
method where len
is 32Mb.
The SearchableSnapshotIndexInput
should take care of opening the InputStream
from the required position (by using range bytes S3 requests), closing it and reopening it on IndexInput
's seekings. It should also take care of draining/aborting the InputStream in case not all bytes were read.
public void readFromCache(final Path filePath, final long fileLength, final CheckedSupplier<IndexInput, IOException> fileSource, | ||
final long position, final byte[] b, int off, int len) throws IOException { | ||
long pos = position; | ||
while (len > 0 && lifecycleState() == Lifecycle.State.STARTED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might exit this loop with len > 0
; if we do so I think we have to throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in this change
while (len > 0 && lifecycleState() == Lifecycle.State.STARTED) { | ||
final CacheEntry cacheEntry = getOrAddToCache(filePath, fileLength, fileSource); | ||
if (cacheEntry.tryIncRef() == false) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about thrashing here. We're trying to handle the case where the entry is evicted between getOrAddToCache
and tryIncRef
, but if we have multiple threads all adding entries that get immediately evicted (might happen if the cache is too small and/or almost full) I think we could end up stuck here. Maybe it'd be better to give up on the cache in this case and read directly from the fileSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, and this is a good suggestion.
I changed the test so that it randomly uses a smaller cache and changed the readFromCache() method so that it serves the read operation directly from the source if the picked up cache file is already evicted. It does a similar thing later in the process, if the cache file has been evicted just before writing the new range on disk.
...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java
Outdated
Show resolved
Hide resolved
…arch/xpack/searchablesnapshots/cache/CacheService.java Co-Authored-By: David Turner <david.turner@elastic.co>
@DaveCTurner Thanks for your feedback. As said via another channel, I addressed the concerning issue about opened file handles by extracting the previous Sadly, there is a file leak reported by the test framework and I haven't found the culprit yet:
```
java.lang.RuntimeException: file handle leaks: [FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd3/d0/indices/bRsLBbR5Q1-6TK2P2eDbtA/0/snapshots/2USI9vDEQau9K0LQ3abr7g/MF2_TsUVRxq1vcMHB_Mn5w), FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd4/d0/indices/WoK6PayuQL6_V9Q4yvWVzA/0/snapshots/v6rc2UOlSyOHq9TaCnUWcw/LA7wlmXZT0y4Rowdb3LplA), FileChannel(/home/tanguy/Dev/elastic/elasticsearch-searchable-snapshots/x-pack/plugin/searchable-snapshots/build/testrun/test/temp/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsIntegTests_5C32FE860A9D9738-001/tempDir-002/node_sd3/d0/indices/hn_aBXC2TLaZnWHRP4hadg/0/snapshots/G7k5qu0UQYWd3__tc9e7AQ/OezzZN78SXe-g6tH064zkw)]
at __randomizedtesting.SeedInfo.seed([5C32FE860A9D9738]:0)
at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
at org.apache.lucene.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:228)
at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.Exception
at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:197)
at org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:166)
at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
at org.elasticsearch.xpack.searchablesnapshots.cache.CacheFile.tryIncRef(CacheFile.java:78)
at org.elasticsearch.xpack.searchablesnapshots.cache.CacheService.readFromCache(CacheService.java:151)
at org.elasticsearch.xpack.searchablesnapshots.cache.CacheDirectory$CacheBufferedIndexInput.readInternal(CacheDirectory.java:91)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:102)
at org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:183)
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.(Lucene50CompoundReader.java:81)
at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:70)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:101)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:84)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:69)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:61)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:640)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:84)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:137)
at org.elasticsearch.index.engine.ReadOnlyEngine.open(ReadOnlyEngine.java:176)
at org.elasticsearch.index.engine.ReadOnlyEngine.(ReadOnlyEngine.java:118)
at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots.lambda$getEngineFactory$0(SearchableSnapshots.java:99)
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1592)
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1558)
at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$7(StoreRecovery.java:464)
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:89)
at org.elasticsearch.repositories.blobstore.FileRestoreContext.lambda$restore$0(FileRestoreContext.java:159)
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:146)
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
at org.elasticsearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:66)
at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:89)
at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:46)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:688)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
... 1 more
```
any help, comment and feedback from you will be greatly appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I think there's a subtle issue with thread-safety in CacheBufferedIndexInput
- see comments below.
this.ioContext = ioContext; | ||
this.offset = offset; | ||
this.end = offset + length; | ||
this.clones = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we clone and/or slice from different threads so we should make this threadsafe:
this.clones = new ArrayList<>(); | |
this.clones = new CopyOnWriteArrayList<>(); |
@Override | ||
public CacheBufferedIndexInput clone() { | ||
final CacheBufferedIndexInput clone = (CacheBufferedIndexInput) super.clone(); | ||
clone.clones = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we clone and/or slice from different threads so we should make this threadsafe:
clone.clones = new ArrayList<>(); | |
clone.clones = new CopyOnWriteArrayList <>(); |
|
||
@Override | ||
public void close() { | ||
if (closed == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use an AtomicBoolean
for thread-safety, and then
if (closed == false) { | |
if (closed.compareAndSet(false, true)) { |
Probably wise to assert closed.get() == false
in the other methods too.
public void close() { | ||
if (closed == false) { | ||
closed = true; | ||
if (releasable != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid this null check:
if (releasable != null) { | |
Releasables.close(releasable); |
if (releasable != null) { | ||
Releasables.close(releasable); | ||
} | ||
if (clones != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is always true.
Releasables.close(releasable); | ||
} | ||
if (clones != null) { | ||
for (IndexInput clone : clones) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe IOUtils.close(clones)
?
final CacheBufferedIndexInput clone = (CacheBufferedIndexInput) super.clone(); | ||
clone.clones = new ArrayList<>(); | ||
clone.releasable = null; | ||
clones.add(clone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that if we keep an IndexInput
open for an extended period of time then we will accumulate a lot of clones here, all of which have been closed (or maybe just discarded). I think that e.g. with a compound segment we might keep it open for the life of the shard, and add at least one clone or slice for each search.
@elasticmachine update branch |
…hots-cache-directory
@elasticmachine update branch |
…hots-cache-directory
@DaveCTurner I updated this branch according to our latest discussions. This is not yet perfect, but I think this is much more closer to how we think this cache system should be. I'm running more tests on my side, in the meantime that would be great if you could have a look to the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I will need to think about this more carefully tomorrow as the interlocking ref counting is pretty subtle, but I have done a superficial scan and think the structure looks about right.
CacheFile cacheFile = null; | ||
try { | ||
cacheFile = getOrAcquire(); | ||
final CacheFile.FileChannelRefCounted channelRef = cacheFile.getChannelRefCounter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheFile
may be null
here if there's a concurrent eviction I think, but in that case we don't read directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this.cacheFile
may be null and local variable not assigned as an exception should be thrown. I added an assertion to check non-nullity of the local cacheFile
.
refCounter.decRef(); | ||
} | ||
} | ||
if (evictionListeners != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: always true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is a left over.
try { | ||
ensureOpen(); | ||
final Set<EvictionListener> newListeners = new HashSet<>(listeners); | ||
if (newListeners.add(Objects.requireNonNull(listener)) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert
this (and also that listener != null
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
if (evictionListeners != null) { | ||
evictionListeners.forEach(listener -> listener.onEviction(this)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert here that listeners.isEmpty()
and that refCounter.refCount() == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can? The close()
method does not clean up the list of listeners but instead captures the registered listeners at closing time (end set the closed
flag so that no new listeners can be registered). This is the duty of each listener to release and remove itself once it is notified of the eviction (and once the listener is done with any cache file resource)
...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java
Outdated
Show resolved
Hide resolved
...ble-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheFile.java
Show resolved
Hide resolved
channelRefCounter.deleteAfterClose(); | ||
} else { | ||
try { | ||
Files.deleteIfExists(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a test failure here due to WindowsFS
rejecting deletes of files that are still open. I'm not quite sure why yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raaaah. I ran the tests quite many times over the weekend and didn't get such failure. I'll try to reproduce and let you know my findings.
...src/test/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotsIntegTests.java
Show resolved
Hide resolved
@DaveCTurner As said via another channel, this is ready for your review. I'm still running |
@elasticmachine update branch |
…hots-cache-directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left only minor comments.
I think there might be value in extracting the parts of CacheBufferedIndexInput
that only apply with parent == null
since they're kind of a different concept from the actual IndexInput
.
I have studied the locking in detail and see that we always obtain locks in the order CacheService -> CacheBufferedIndexInput -> CacheFile, so the deadlock situation looks good.
|
||
private @Nullable AtomicReference<CacheFile> cacheFile; | ||
private @Nullable CacheBufferedIndexInput parent; | ||
private ReleasableLock cacheEvictionLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a special kind of lock here any more, synchronized (this)
should be ok here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, synchronized
blocks should be enough. I pushed 0aece73
|
||
private volatile Set<EvictionListener> listeners; | ||
private volatile FileChannel channel; | ||
private volatile boolean closed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming this to evicted
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I pushed 48fa923
'}'; | ||
} | ||
|
||
private int copySource(long start, long end, byte[] buffer, int offset) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming to something like readDirectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, thanks. I pushed 391633f
@SuppressForbidden(reason = "Use positional writes on purpose") | ||
void writeCacheFile(FileChannel fc, long start, long end) throws IOException { | ||
assert assertFileChannelOpen(fc); | ||
final byte[] copyBuffer = new byte[Math.toIntExact(Math.min(8192L, end - start))]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should extract 8192L
as a named constant (used here and in copySource
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added COPY_BUFFER_SIZE
in 99a9968
} | ||
|
||
@Override | ||
public IndexInput slice(String sliceDescription, long offset, long length) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this doesn't throw IOException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right -> 348fac8
|
||
// Prevent new CacheFile objects to be added to the cache while the cache is being fully or partially invalidated | ||
// This can happen because CacheFile objects might execute listeners at eviction time, potentially forcing more | ||
// objects (like CacheDirectory's index inputs) to get new CacheFile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this. We invalidate the cache when stopping and expected that in that case any attempts to get things from the cache would result in exceptions. Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I'm confused too 😕 I remember I added this because of some kind of deadlock in integration tests (not in manual tests, though) where the full invalidation ran concurrently with some file accesses in Lucene directory (executed by nodes list shard stores actions). The invalidation blocked the cache but the file accesses tried to get/compute new entries in cache, resulting in deadlocks. I've ran more tests without these locks and everything is fine so I now think that the deadlocks were just a manifestation of some previous buggy code.
void onEviction(CacheFile evictedCacheFile); | ||
} | ||
|
||
static final int RANGE_SIZE = 1 << 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should run tests with a smaller (and non-power-of-two) here just to be certain we have good coverage, and a larger range size in production. Can we make this something that gets passed in?
(I thought I'd already made this comment but it seems to have been lost, sorry for the duplicate if so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd already made this comment but it seems to have been lost,
Yes you did and no not lost :) I replied in #50693 (comment) but basically I agree with you and I think it should be a another setting (we might want to play with cache range size in tests). It requires a bit of plumbing in tests so I proposed to do it in a follow up PR (but I can do it in this one if you prefer).
...napshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheDirectory.java
Show resolved
Hide resolved
Thanks @DaveCTurner. I've addressed and/or replied to your comments.
That sounds like a good suggestion but the part that only apply with parent == null is |
Perhaps it's clearer to show what I mean by the extraction: WDYT? |
@DaveCTurner Thanks, I better understand what you meant. It's indeed clearer with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work @tlrx
Thanks David! |
This commit changes the current cache range size from 32kb to 32mb and makes it configurable through a globally defined xpack.searchable. snapshot.cache.range_size setting. This setting is set to low values in unit tests (see #50693) but a bit higher values in integration tests so that they don't take too much time to complete.
Today cache files are identified in cache using a string representing an absolute path to a file on disk. This path is a sub directory of the current shard data path and as such already contains identification bits like the current index id and the shard id. It also contains the snapshot id that is passed at CacheDirectory creation time. While this has been done for quick prototyping and already been improved in #51520, it feels wrong to rely on a path converted to a string as cache keys. Instead we should have a distinct CacheKey object to identify CacheFile in cache. Relates #50693
This commit is a first step to add instrumentation to the CacheDirectory added in #50693. It adds a new mutable IndexInputStats object that allows to track various information about how CacheBufferedIndexInput interacts with the underlying cache file to satisfy the read operations. It keep tracks of small/large forward/backward seekings as well as the total number of bytes read from the IndexInput and the total number of bytes read/written from/to the CacheFile. Note that the stats do not reflect the exact usage that Lucene does of the IndexInputs opened through a CacheDirectory: IndexInputStats is not aware of the read operations that are directly served at a higher level by the internal BufferedIndexInput's buffer. Instead it tracks what really hit the disk.
Note: this pull request targets the
feature/searchable-snapshots
branchThis pull request adds a new
CacheDirectory
implementation which allows to cache ranges of bytes of Lucene files. This directory implementation uses aCacheService
to retrieve and to read ranges of bytes from cached files stored on disk. The cached files are created as sparse files on disk and the availability of bytes ranges are tracked through aSparseFileTracker
instance (thanks David!).