Skip to content

Add source fallback for keyword fields using operation #88735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jul 28, 2022

Conversation

jdconrad
Copy link
Contributor

@jdconrad jdconrad commented Jul 22, 2022

This is a follow up PR to (#88040) to contrast a parameter passed to fielddataBuilder versus separate methods of fielddataBuilder and scriptFielddataBuilder. The parameter is an enum named MappedFieldType.FielddataOperation where the operation specifies what data structures should be used to generate fielddata. Right now it only has SEARCH and SCRIPT, but in the future this could extended to AGGS, SORT, etc. to allow fine-grained tuning where doc values are pulled from.

Note that something like DOC_VALUES_ONLY or SOURCE_FALLBACK doesn't work for scripting here because with exceptions like text field scripting can't know if it wants doc values primarily or source primarily so it has to be determined via the MappedType per operation.

@jdconrad jdconrad added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Core/Infra Meta label for core/infra team labels Jul 22, 2022
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@jdconrad
Copy link
Contributor Author

@romseygeek Would you please take another pass when you have the chance? Thank you!

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left a few comments but I think we're nearly there.

@@ -206,7 +206,10 @@ public IndexService(
this.indexSortSupplier = () -> indexSettings.getIndexSortConfig()
.buildIndexSort(
mapperService::fieldType,
(fieldType, searchLookup) -> indexFieldData.getForField(fieldType, FieldDataContext.noRuntimeFields("index sort"))
(fieldType, searchLookup, fielddataOperation) -> indexFieldData.getForField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to pass this extra fielddataOperation parameter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Also, for my understanding, is this because we never want sort to fallback so there's no reason to build a different operation type into an index sort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@@ -30,7 +40,9 @@ public record FieldDataContext(String fullyQualifiedIndexName, Supplier<SearchLo
public static FieldDataContext noRuntimeFields(String reason) {
return new FieldDataContext(
"",
() -> { throw new UnsupportedOperationException("Runtime fields not supported for [" + reason + "]"); }
() -> { throw new UnsupportedOperationException("Runtime fields not supported for [" + reason + "]"); },
null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safer to have a default sourcePathsLookup of s -> Set.of(s) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Fixed.

protected static FieldDataContext mockFielddataContext() {
return new FieldDataContext("test", mockContext()::lookup);
protected static FieldDataContext mockFielddataContext(MappedFieldType.FielddataOperation fielddataOperation) {
return new FieldDataContext("test", mockContext()::lookup, mockContext()::sourcePath, fielddataOperation);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these are script tests, I think we can just always use FielddataOperation.SCRIPT here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting case for runtime fields. I don't think it's actually possible to hit source fallback for a runtime field at all, but they will likely be executed at a top level as a search operation, so I think for these tests it would make the most sense to switch it to SEARCH rather than SCRIPT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's actually possible to hit source fallback for a runtime field at all

I'm confused, runtime fields explicitly use scripts so I think they're quite likely to use this path? Or am I misunderstanding what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is when a runtime field is first executed as a script its part of a search query where the runtime field looks like a standard field. At that first execution the runtime field should always be found as the doc values portion of a field that it's emulating, right? So I don't think the operation actually matters for that "top-level" runtime field. However, for fields referenced from that runtime field then the operation will be SCRIPT when we do the lookup from LeafDocLookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests you're right, this should be SCRIPT. I'm thinking about this incorrectly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jdconrad
Copy link
Contributor Author

@romseygeek Thank you for the continued review iterations. This is ready for another pass.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jdconrad
Copy link
Contributor Author

@romseygeek Thank you! Will commit as soon as CI passes.

@jdconrad jdconrad removed the WIP label Jul 28, 2022
@elasticsearchmachine elasticsearchmachine removed Team:Core/Infra Meta label for core/infra team Team:Clients Meta label for clients team labels Jul 28, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @jdconrad, I've created a changelog YAML for you.

@jdconrad jdconrad merged commit 5e0701f into elastic:main Jul 28, 2022
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 29, 2022
* upstream/main:
  Add 8.5 migration docs (elastic#88923)
  Script: Reindex & UpdateByQuery Metadata (elastic#88665)
  Remove unused plugins dir var from server CLI (elastic#88917)
  Use tracing API in TaskManager (elastic#88885)
  Add source fallback for keyword fields using operation (elastic#88735)
  Prune changelogs after 8.3.3 release
  Bump versions after 8.3.3 release
  Add a test for checking for misspelled "dry_run" parameters for Desired Nodes API (elastic#88898)
  Speedup BalanceUnbalancedClusterTests (elastic#88794)
  Preventing exceptions on node shutdown in integration tests (elastic#88827)
  Do not trigger check part3 for test mute and docs PRs (elastic#88895)
  Add troubleshooting docs about data corruption (elastic#88760)
  Mute RollupActionSingleNodeTests#testRollupDatastream (elastic#88891)
  [DOCS] Domain splitting impacts API keys (elastic#88677)
  Fix SqlSearchIT testAllTypesWithRequestToOldNodes (elastic#88866) (elastic#88883)
  Update synthetic-source.asciidoc (elastic#88880)
  Log more details in TaskAssertions (elastic#88864)
  Make Tuple a record (elastic#88280)
arteam added a commit to arteam/elasticsearch that referenced this pull request Aug 1, 2022
Add source fallback operation when looking up a the factor field added in elastic#88735

Resolves elastic#88985
arteam added a commit that referenced this pull request Aug 1, 2022
Add source fallback operation when looking up a the factor field added in #88735

Resolves #88985
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look as well, this change seemed important enough for me to look at even if it's already merged. I left some questions. An additional question I have is around the scope: the title mentions keyword fields, but I was under the impression that the fallback has been added also for numbers and geo_points?

*/
public enum FielddataOperation {
SEARCH,
SCRIPT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add docs to clarify how these two options differ from each other?

failIfNoDocValues();
}

if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the first part of the if necessary? It's maybe a protection mechanism for potential new operations that may be added in the future? Isn't it more practical to just say if hasDocValues() load from docvalues?

failIfNoDocValues();
}

if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

failIfNoDocValues();
}

if ((operation == FielddataOperation.SEARCH || operation == FielddataOperation.SCRIPT) && hasDocValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, same as above

@@ -435,7 +435,7 @@ public Analyzer buildCustomAnalyzer(

@Override
protected IndexFieldData<?> buildFieldData(MappedFieldType ft) {
return context.getForField(ft);
return context.getForField(ft, MappedFieldType.FielddataOperation.SEARCH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given all these related changes, too late now but have we considered having a separate getForField only for scripting and leaving the existing one alone, so that less changes are required in existing code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants