-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sequential stored field reader in LuceneSyntheticSourceChangesSnapshot #121636
base: main
Are you sure you want to change the base?
Use sequential stored field reader in LuceneSyntheticSourceChangesSnapshot #121636
Conversation
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields); | ||
String codec = EngineConfig.INDEX_CODEC_SETTING.get(mapperService.getIndexSettings().getSettings()); | ||
boolean shouldForceSequentialReader = CodecService.BEST_COMPRESSION_CODEC.equals(codec); | ||
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields, shouldForceSequentialReader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use always force a sequential reader. I think no matter the stored field format, it has benefits given how LuceneSyntheticSourceChangesSnapshot
access stored fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure. Does this not depend upon how jumpy the data is, which with index sorting could be very jumpy - and whether there is more than one stored field per doc (not sure that is always true?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is why then changed to logic not look at index.codec setting anymore.
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
@martijnvg looks like both links point to the same benchmark? |
@tlrx Oops, I've update the link. |
(I updated the description with more information) |
This feels a bit counterintuitive. Is it because the CCR buffer is big enough that reads end up being sequential? Also, is index sorting applied to the rally track? I’m trying to understand why this helps since we said index sorting would make sequential reads worse, yet this change applies it all the time. |
Yes, this is based on the elastic/logs track which sets index.mode to logsdb (which means index sort fields are
Reader is a misleading word in the title. This is just using a stored field reader implementation that decompresses an entire block at a time (This delegate to |
My comment was more that it's beneficial for a specific layout, the block of doc ids to retrieve is dense but here we apply it all the time. Performance in this mode might degrade when adding more hosts or when more diverse ingestion happens? |
For the elastic/logs track, some data streams have 0 or 1 host, some have a few and some up to 50 hosts iirc.
Right, if the buffer is small then maybe sequential stored field reader is the right choice.
You mean the docid gap between requested seqno? I will look into this.
Yes, the non-synthetic implementation selectively uses sequential stored field reader based on the range of requested docid. Like you said we can do that here too and be a little bit less strict. We can in the |
// | ||
// A sequential reader decompresses a block eagerly, so that increasing adjacent doc ids can access stored fields without | ||
// compressing on each StoredFields#document(docId) invocation. The only downside is the last few operations in the request | ||
// seq_no range are at the beginning of a block, which means stored fields for many docs are being decompressed that isn't used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true also for zstd? Or rather would that always decompress the entire thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it both true for zstd or deflate, since both de-compress an entire block.
…gesSnapshot_sequential_stored_reader
I'm printing average docid gap in |
One aspect of investigating the performance difference between stored and synthetic recovery source that isn't included in this PR history, is that ccr replication is so much slower with zstd compressed stored fields compared to previous default for best compression (deflate) (see results). The deflate stored field decompressor can skip decompressing unneeded blocks, whereas with the zstd decompressor that isn't the case. Given that all docids are being visited with ccr replication, a lot of blocks are being repetitively de-compressed. This is much heavier with the current default stored field codec with logsdb (zstd). This PR tries to address this at the |
…gesSnapshot_sequential_stored_reader
I noticed that many shard changes requests end up querying for monotonically increasing docids. So I did another iteration of this change and now sequential reader only gets used if docids are dense. This is similar to the implementation that gets for stored recovery source ( To check the effect of this change I ran two benchmarks with the new elastic/logs logsdb with ccr track, the first is using stored recovery source (baseline in linked dashboard) and the second is with synthetic recovery source (contender in linked dashboard): https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/r/s/jbtVP The results between stored and synthetic recovery source now look almost the same, which I think makes sense since both implementations now use the sequential reader in roughly the same number of times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very interesting! In this benchmark, source recovery is fast because it primarily relies on the sequential reader. However, in #114618, we mentioned that this scenario should be quite rare, so I’m curious, what changed? That said, it's reassuring to see that synthetic source recovery can be just as fast, even when using the sequential reader.
|
||
int[] nextDocIdArray = nextDocIds.toArray(); | ||
leafFieldLoader = storedFieldLoader.getLoader(leafReaderContext, nextDocIdArray); | ||
leafSourceLoader = sourceLoader.leaf(leafReaderContext.reader(), nextDocIdArray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another side effect of providing the array of document IDs is that some field loaders may choose to load their values eagerly. I don't see this as a problem, but I wanted to point out that we would lose this behavior if we implement the TODO above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I will update the TODO to include that perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that some doc values loaders already apply this strategy when doc ids are provided and there is a single value per field:
Line 55 in b6facb2
* The singleton optimization is mostly about looking up ordinals |
I think that the recent indexed logs are aligning often well with the index sorting and then fetching recent operations by seqno ends up with a dense monologic increasing docids. Some of data streams have one or no unique host name, while others up to 50 or something like that. I suspect if host.name had higher cardinality then we couldn't use a sequential stored reader often. But this then would also apply for
Agreed, that is a nice observation. |
…gesSnapshot_sequential_stored_reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Martijn!
@@ -9,6 +9,8 @@ | |||
|
|||
package org.elasticsearch.index.engine; | |||
|
|||
import com.carrotsearch.hppc.IntArrayList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to use this namespace (seems odd since it comes from a testing tool)?
Without a change like this, synthetic recovery source is ~2 times slower compared to stored recovery source. See this dashboard that shows total read time in follower cluster broken down by shard: https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/r/s/pFChT
Baseline is stored recovery source and contender is synthetic recovery source. The benchmark is performed using a new elastic/logs challenge (elastic/rally-tracks#734), that configures auto following of logs-* into the local cluster.
After investigation, the diff between operations read between baseline and contender was caused by decompressing stored fields over and over again:
(this happened, because for each operation, a stored field block gets de-compressed, but only stored fields for one document is read. The next op is very likely to de-compress the same stored field block)
The same benchmark with this change:
https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/r/s/kVgM2
With the change synthetic recovery source is more than 2 times faster compared to synthetic recovery source. This matches with what we observed in earlier ad-hoc / micro benchmarks.
Labelling this as a non-issue. This is a performance bug, but synthetic recovery source hasn't been released yet.