Skip to content

Commit bb76dde

Browse files
committed
Force niofs for fdt tmp file read access when flushing stored fields (elastic#129538)
Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects. Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache. This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.
1 parent 5f43af5 commit bb76dde

File tree

3 files changed

+42
-2
lines changed

3 files changed

+42
-2
lines changed

docs/changelog/130308.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 130308
2+
summary: Force niofs for fdt tmp file read access when flushing stored fields
3+
area: Logs
4+
type: bug
5+
issues: []

server/src/main/java/org/elasticsearch/index/store/FsDirectoryFactory.java

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ static boolean useDelegate(String name, IOContext ioContext) {
177177
}
178178

179179
final LuceneFilesExtensions extension = LuceneFilesExtensions.fromExtension(getExtension(name));
180-
if (extension == null || extension.shouldMmap() == false) {
180+
if (extension == null || extension.shouldMmap() == false || avoidDelegateForFdtTempFiles(name, extension)) {
181181
// Other files are either less performance-sensitive (e.g. stored field index, norms metadata)
182182
// or are large and have a random access pattern and mmap leads to page cache trashing
183183
// (e.g. stored fields and term vectors).
@@ -186,6 +186,38 @@ static boolean useDelegate(String name, IOContext ioContext) {
186186
return true;
187187
}
188188

189+
/**
190+
* Force not using mmap if file is tmp fdt file.
191+
* The tmp fdt file only gets created when flushing stored
192+
* fields to disk and index sorting is active.
193+
* <p>
194+
* In Lucene, the <code>SortingStoredFieldsConsumer</code> first
195+
* flushes stored fields to disk in tmp files in unsorted order and
196+
* uncompressed format. Then the tmp file gets a full integrity check,
197+
* then the stored values are read from the tmp in the order of
198+
* the index sorting in the segment, the order in which this happens
199+
* from the perspective of tmp fdt file is random. After that,
200+
* the tmp files are removed.
201+
* <p>
202+
* If the machine Elasticsearch runs on has sufficient memory the i/o pattern
203+
* that <code>SortingStoredFieldsConsumer</code> actually benefits from using mmap.
204+
* However, in cases when memory scarce, this pattern can cause page faults often.
205+
* Doing more harm than not using mmap.
206+
* <p>
207+
* As part of flushing stored disk when indexing sorting is active,
208+
* three tmp files are created, fdm (metadata), fdx (index) and
209+
* fdt (contains stored field data). The first two files are small and
210+
* mmap-ing that should still be ok even is memory is scarce.
211+
* The fdt file is large and tends to cause more page faults when memory is scarce.
212+
*
213+
* @param name The name of the file in Lucene index
214+
* @param extension The extension of the in Lucene index
215+
* @return whether to avoid using delegate if the file is a tmp fdt file.
216+
*/
217+
static boolean avoidDelegateForFdtTempFiles(String name, LuceneFilesExtensions extension) {
218+
return extension == LuceneFilesExtensions.TMP && name.contains("fdt");
219+
}
220+
189221
MMapDirectory getDelegate() {
190222
return delegate;
191223
}

server/src/test/java/org/elasticsearch/index/store/FsDirectoryFactoryTests.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,10 @@ public void testPreload() throws IOException {
6161
assertTrue(FsDirectoryFactory.HybridDirectory.useDelegate("foo.kdi", newIOContext(random())));
6262
assertFalse(FsDirectoryFactory.HybridDirectory.useDelegate("foo.kdi", Store.READONCE_CHECKSUM));
6363
assertTrue(FsDirectoryFactory.HybridDirectory.useDelegate("foo.tmp", newIOContext(random())));
64-
assertTrue(FsDirectoryFactory.HybridDirectory.useDelegate("foo.fdt__0.tmp", newIOContext(random())));
64+
assertFalse(FsDirectoryFactory.HybridDirectory.useDelegate("foo.fdt__0.tmp", newIOContext(random())));
65+
assertFalse(FsDirectoryFactory.HybridDirectory.useDelegate("_0.fdt__1.tmp", newIOContext(random())));
66+
assertTrue(FsDirectoryFactory.HybridDirectory.useDelegate("_0.fdm__0.tmp", newIOContext(random())));
67+
assertTrue(FsDirectoryFactory.HybridDirectory.useDelegate("_0.fdx__4.tmp", newIOContext(random())));
6568
MMapDirectory delegate = hybridDirectory.getDelegate();
6669
assertThat(delegate, Matchers.instanceOf(FsDirectoryFactory.PreLoadMMapDirectory.class));
6770
FsDirectoryFactory.PreLoadMMapDirectory preLoadMMapDirectory = (FsDirectoryFactory.PreLoadMMapDirectory) delegate;

0 commit comments

Comments
 (0)