-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add source fallback for keyword fields using operation #88735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@romseygeek Would you please take another pass when you have the chance? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I left a few comments but I think we're nearly there.
@@ -206,7 +206,10 @@ public IndexService( | |||
this.indexSortSupplier = () -> indexSettings.getIndexSortConfig() | |||
.buildIndexSort( | |||
mapperService::fieldType, | |||
(fieldType, searchLookup) -> indexFieldData.getForField(fieldType, FieldDataContext.noRuntimeFields("index sort")) | |||
(fieldType, searchLookup, fielddataOperation) -> indexFieldData.getForField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to pass this extra fielddataOperation parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also, for my understanding, is this because we never want sort to fallback so there's no reason to build a different operation type into an index sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly
@@ -30,7 +40,9 @@ public record FieldDataContext(String fullyQualifiedIndexName, Supplier<SearchLo | |||
public static FieldDataContext noRuntimeFields(String reason) { | |||
return new FieldDataContext( | |||
"", | |||
() -> { throw new UnsupportedOperationException("Runtime fields not supported for [" + reason + "]"); } | |||
() -> { throw new UnsupportedOperationException("Runtime fields not supported for [" + reason + "]"); }, | |||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be safer to have a default sourcePathsLookup of s -> Set.of(s)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Fixed.
protected static FieldDataContext mockFielddataContext() { | ||
return new FieldDataContext("test", mockContext()::lookup); | ||
protected static FieldDataContext mockFielddataContext(MappedFieldType.FielddataOperation fielddataOperation) { | ||
return new FieldDataContext("test", mockContext()::lookup, mockContext()::sourcePath, fielddataOperation); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these are script tests, I think we can just always use FielddataOperation.SCRIPT here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting case for runtime fields. I don't think it's actually possible to hit source fallback for a runtime field at all, but they will likely be executed at a top level as a search operation, so I think for these tests it would make the most sense to switch it to SEARCH rather than SCRIPT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's actually possible to hit source fallback for a runtime field at all
I'm confused, runtime fields explicitly use scripts so I think they're quite likely to use this path? Or am I misunderstanding what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is when a runtime field is first executed as a script its part of a search query where the runtime field looks like a standard field. At that first execution the runtime field should always be found as the doc values portion of a field that it's emulating, right? So I don't think the operation actually matters for that "top-level" runtime field. However, for fields referenced from that runtime field then the operation will be SCRIPT when we do the lookup from LeafDocLookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the tests you're right, this should be SCRIPT. I'm thinking about this incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@romseygeek Thank you for the continued review iterations. This is ready for another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@romseygeek Thank you! Will commit as soon as CI passes. |
Hi @jdconrad, I've created a changelog YAML for you. |
* upstream/main: Add 8.5 migration docs (elastic#88923) Script: Reindex & UpdateByQuery Metadata (elastic#88665) Remove unused plugins dir var from server CLI (elastic#88917) Use tracing API in TaskManager (elastic#88885) Add source fallback for keyword fields using operation (elastic#88735) Prune changelogs after 8.3.3 release Bump versions after 8.3.3 release Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (elastic#88898) Speedup BalanceUnbalancedClusterTests (elastic#88794) Preventing exceptions on node shutdown in integration tests (elastic#88827) Do not trigger check part3 for test mute and docs PRs (elastic#88895) Add troubleshooting docs about data corruption (elastic#88760) Mute RollupActionSingleNodeTests#testRollupDatastream (elastic#88891) [DOCS] Domain splitting impacts API keys (elastic#88677) Fix SqlSearchIT testAllTypesWithRequestToOldNodes (elastic#88866) (elastic#88883) Update synthetic-source.asciidoc (elastic#88880) Log more details in TaskAssertions (elastic#88864) Make Tuple a record (elastic#88280)
Add source fallback operation when looking up a the factor field added in elastic#88735 Resolves elastic#88985
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look as well, this change seemed important enough for me to look at even if it's already merged. I left some questions. An additional question I have is around the scope: the title mentions keyword fields, but I was under the impression that the fallback has been added also for numbers and geo_points?
*/ | ||
public enum FielddataOperation { | ||
SEARCH, | ||
SCRIPT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add docs to clarify how these two options differ from each other?
failIfNoDocValues(); | ||
} | ||
|
||
if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the first part of the if necessary? It's maybe a protection mechanism for potential new operations that may be added in the future? Isn't it more practical to just say if hasDocValues() load from docvalues?
failIfNoDocValues(); | ||
} | ||
|
||
if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
failIfNoDocValues(); | ||
} | ||
|
||
if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too, same as above
@@ -435,7 +435,7 @@ public Analyzer buildCustomAnalyzer( | |||
|
|||
@Override | |||
protected IndexFieldData<?> buildFieldData(MappedFieldType ft) { | |||
return context.getForField(ft); | |||
return context.getForField(ft, MappedFieldType.FielddataOperation.SEARCH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given all these related changes, too late now but have we considered having a separate getForField only for scripting and leaving the existing one alone, so that less changes are required in existing code?
This is a follow up PR to (#88040) to contrast a parameter passed to fielddataBuilder versus separate methods of fielddataBuilder and scriptFielddataBuilder. The parameter is an enum named MappedFieldType.FielddataOperation where the operation specifies what data structures should be used to generate fielddata. Right now it only has SEARCH and SCRIPT, but in the future this could extended to AGGS, SORT, etc. to allow fine-grained tuning where doc values are pulled from.
Note that something like DOC_VALUES_ONLY or SOURCE_FALLBACK doesn't work for scripting here because with exceptions like text field scripting can't know if it wants doc values primarily or source primarily so it has to be determined via the MappedType per operation.