-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Use Lucene101 postings format by default #126080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Lucene101 postings format by default #126080
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment, LGTM otherwise.
assertThat(perFieldMapperCodec.getPostingsFormatForField("another_field"), instanceOf(ES812PostingsFormat.class)); | ||
assertThat( | ||
perFieldMapperCodec.getPostingsFormatForField("another_field"), | ||
instanceOf(timeSeries ? ES812PostingsFormat.class : Lucene101PostingsFormat.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests fail if build is in release mode. Can you add test-release
label to this PR?
I think we should have an if check here. If feature flag is enabled we expect the lucene 101 posting format otherwise the es812 format.
Do we need to also test full bwc on this PR just to make sure? Backwards compatibility is handled via the new index version and applying the new codec on write only for newly created indices, correct? Do we need any additional testing compared to what we already have? |
I think we can also add the
This isn't gated behind an index version (for now just feature flag). Any non tsdb/logsdb index (including existing indices) will start to use the new codec for new segments. The old segments remain to be used by the previous posting format. The codec name is stored in disk and that is used to load codec via service provider.
I think current mixed cluster / rolling upgrade integration testing is sufficient? We don't modify an existing posting format, just use a different one. |
Actually, I did add a new index version and I check the index version when deciding which postings format to use. I wasn't sure about backwards compatibility--would lucene still read the old indices correctly if it's expecting the new postings format? If so, I can remove the check |
Whoops, you're right. I think we can remove the index version check. I don't gating behind index version is needed, given that changing codec on per field level should work (as long as the codec remains available in codebase). |
Right, |
I think that's the goal.
Could you clarify what you mean by this? The feature flag is just there because we're not sure we want to push this to prod yet. Once we've benchmarked it and we're ready to make it generally available, I expect we'll remove the feature flag so that it's always enabled. It's not intended to be toggled on/off depending on compatibility. My understanding is that compatibility is guaranteed because the formatted postings include the name of the format class, so when they're loaded, the service provider knows which class to use to decode them. Sure, if we need to write new postings to disk, they'll always be in the new format so indices might end up with a mix of different posting formats, but that's fine because each written postings file includes the codec name.
Right, at this point we just want to see the effect on the nightly benchmarks |
I chatted with Luca and we both agree that when the feature flag gets removed we should introduce an index version, so that only new indices will use 101 postings codec. This makes reasoning about which postings format an index uses easier. |
Ok, I'll reintroduce the index version check then. |
Release tests are failing due to #124263. Waiting for that fix to be merged before I merge this PR. |
This reverts commit 4c174a8.
@@ -31,19 +34,33 @@ | |||
* vectors. | |||
*/ | |||
public class PerFieldFormatSupplier { | |||
public static final FeatureFlag USE_LUCENE101_POSTINGS_FORMAT = new FeatureFlag("use_lucene101_postings_format"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordan-powers @martijnvg Lucene 10.3 (still in development) will bring a new postings format. Is the intention that this feature flag to select the "latest" codec? Or is it really specific to 10.1? I ask only because we need to determine if a use_lucene101_postings_format
is warranted or maybe make this flag more general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is the use the latest default lucene postings format. Unfortunately the name doesn't reflect that, but we want it to use the latest default postings format.
Change PerFieldFormatSupplier to get default postings from current codec being used. This way the default codec for standard index mode isn't tied to Lucene101PostingsFormat specifically. And rename the feature flag that controls when stock Lucene posting format is used. Addressed comment from elastic#126080
…rmat is used. Addressed comment from elastic#126080
With this change, new standard indices will use the
Lucene101PostingsFormat
instead of the current defaultES812PostingsFormat
.The current implementation enables the new format for all standard indices. We might consider instead only enabling the new format only for non-datastream standard indices, and/or only when
index.codec
is not set tobest_compression
.Currently, use of the new codec is gated behind a feature flag.
We may or may not decide to backport this to 8.x depending on how the nightly benchmarks look.
Closes #119051