Force niofs for fdt tmp file read access when flushing stored fields #129538

martijnvg · 2025-06-17T09:55:14Z

Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects.

Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache.

This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.

(labelling as non-issue, until feature flag has been removed)

…fields and force direct io for checksuming fdt tmp file.

…n flushing stored fields

This reverts commit 81e0a39.

elasticsearchmachine · 2025-06-20T12:22:33Z

Hi @martijnvg, I've created a changelog YAML for you.

elasticsearchmachine · 2025-06-20T12:39:04Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

ChrisHegarty

Makes sense to me @martijnvg. LGTM

Longer term we should consider how to migrate to using the new IOContext Hints in 10.3, so as to avoid the fragile dependency on the file name.

martijnvg · 2025-06-20T13:52:59Z

Longer term we should consider how to migrate to using the new IOContext Hints in 10.3, so as to avoid the fragile dependency on the file name.

I will add a item on the roadmap for this.

martijnvg · 2025-06-23T05:45:19Z

Running the elastic/logsdb track in logsdb mode with stored source without this change a as baseline and with the change as contender shows very good results overall:

|                                                Min Throughput |                             bulk-index |    972.745       |  1173.39        |     200.643       | docs/s |  +20.63% |
|                                               Mean Throughput |                             bulk-index |  18176.4         | 29977.2         |   11800.8         | docs/s |  +64.92% |
|                                             Median Throughput |                             bulk-index |  16603.4         | 30020.8         |   13417.4         | docs/s |  +80.81% |
|                                                Max Throughput |                             bulk-index |  30518.8         | 33777.2         |    3258.42        | docs/s |  +10.68% |
|                                       50th percentile latency |                             bulk-index |   1669.65        |  1798.24        |     128.593       |     ms |   +7.70% |
|                                       90th percentile latency |                             bulk-index |   3553.13        |  3048.81        |    -504.314       |     ms |  -14.19% |
|                                       99th percentile latency |                             bulk-index |  32530.3         |  5231.3         |  -27299           |     ms |  -83.92% |
|                                     99.9th percentile latency |                             bulk-index | 338651           | 10009.8         | -328641           |     ms |  -97.04% |
|                                    99.99th percentile latency |                             bulk-index |      1.39557e+06 | 14470.3         |      -1.3811e+06  |     ms |  -98.96% |
|                                      100th percentile latency |                             bulk-index |      2.57922e+06 | 21935.1         |      -2.55728e+06 |     ms |  -99.15% |
|                                  50th percentile service time |                             bulk-index |   1672.05        |  1788.15        |     116.102       |     ms |   +6.94% |
|                                  90th percentile service time |                             bulk-index |   3501.96        |  3052.43        |    -449.524       |     ms |  -12.84% |
|                                  99th percentile service time |                             bulk-index |  32819.8         |  5273.51        |  -27546.3         |     ms |  -83.93% |
|                                99.9th percentile service time |                             bulk-index | 338084           |  9954.16        | -328130           |     ms |  -97.06% |
|                               99.99th percentile service time |                             bulk-index |      1.35413e+06 | 14453.1         |      -1.33968e+06 |     ms |  -98.93% |
|                                 100th percentile service time |                             bulk-index |      2.57922e+06 | 21935.1         |      -2.55728e+06 |     ms |  -99.15% |
|                                                    error rate |                             bulk-index |      0           |     0           |       0           |      % |    0.00% |

In particular to ~80% improvement with median indexing and some latencies improving by ~99%.

…lastic#129538) Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects. Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache. This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.

…129538) (#130312) Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects. Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache. This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.

Force normal read advice for stored field temp fdt files

9c3f586

elasticsearchmachine added the v9.1.0 label Jun 17, 2025

Force normal niofs for fdt tmp file read access when flushing stored …

9bbca7e

…fields and force direct io for checksuming fdt tmp file.

martijnvg changed the title ~~Force normal read advice for stored field temp fdt files~~ Force normal niofs for fdt tmp file read access when flushing stored fields Jun 17, 2025

martijnvg changed the title ~~Force normal niofs for fdt tmp file read access when flushing stored fields~~ Force niofs for fdt tmp file read access when flushing stored fields Jun 18, 2025

martijnvg and others added 10 commits June 18, 2025 09:31

another approach: force normal niofs for fdt tmp file read access whe…

28c007f

…n flushing stored fields

[CI] Auto commit changes from spotless

36bc0b9

adjust test

c672cd1

iter

1f914ef

formatting

9bc6591

improve test

2f3d321

typo

0b817a0

Experiment with also avoiding mmap for fdm and fdx tmp files.

81e0a39

Merge remote-tracking branch 'es/main' into tmp_fdt_files

8a7957c

Revert "Experiment with also avoiding mmap for fdm and fdx tmp files."

60d48a8

This reverts commit 81e0a39.

martijnvg added :StorageEngine/Logs You know, for Logs >enhancement labels Jun 20, 2025

Update docs/changelog/129538.yaml

55bfd37

martijnvg marked this pull request as ready for review June 20, 2025 12:38

martijnvg requested a review from ChrisHegarty June 20, 2025 12:38

elasticsearchmachine added the Team:StorageEngine label Jun 20, 2025

martijnvg added >non-issue and removed >enhancement labels Jun 20, 2025

Delete docs/changelog/129538.yaml

6c62d4e

ChrisHegarty approved these changes Jun 20, 2025

View reviewed changes

martijnvg mentioned this pull request Jun 20, 2025

Migrate to using the new IOContext Hints when available #129774

Open

martijnvg merged commit 41f6981 into elastic:main Jun 23, 2025
27 checks passed

martijnvg mentioned this pull request Jun 30, 2025

Remove tmp_fdt_no_mmap feature flag. #130308

Merged

martijnvg mentioned this pull request Jun 30, 2025

[8.19] Force niofs for fdt tmp file read access when flushing stored fields #130312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Force niofs for fdt tmp file read access when flushing stored fields #129538

Force niofs for fdt tmp file read access when flushing stored fields #129538

Uh oh!

martijnvg commented Jun 17, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 20, 2025

Uh oh!

elasticsearchmachine commented Jun 20, 2025

Uh oh!

ChrisHegarty left a comment

Uh oh!

martijnvg commented Jun 20, 2025

Uh oh!

martijnvg commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

Force niofs for fdt tmp file read access when flushing stored fields #129538

Force niofs for fdt tmp file read access when flushing stored fields #129538

Uh oh!

Conversation

martijnvg commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 20, 2025

Uh oh!

elasticsearchmachine commented Jun 20, 2025

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Jun 20, 2025

Uh oh!

martijnvg commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

martijnvg commented Jun 17, 2025 •

edited

Loading