Skip to content

Exclude doc-value-only fields from all fields queries #82712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

romseygeek
Copy link
Contributor

Doc-value-only fields can be searched (slowly) by doing a table scan; this
commit ensures that these fields are not included in an all-field query, ie
a query_string or similar query against *, to prevent these queries
becoming unreasonably slow.

Fixes #82710

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.1.0 labels Jan 18, 2022
@romseygeek romseygeek requested review from jpountz and ywelsch January 18, 2022 10:45
@romseygeek romseygeek self-assigned this Jan 18, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek
Copy link
Contributor Author

Aha, the failing tests are because this new restriction will also exclude runtime fields. Which is an interesting question - do we want to exclude them, given that they also effectively do a table scan and are therefore very slow? Or do we want to be cleverer here?

@ywelsch
Copy link
Contributor

ywelsch commented Jan 18, 2022

I think runtime fields should be excluded as well. If not, then I don't think doc-value-only fields should be excluded either, as runtime fields are generally even slower than doc-value-only fields.

@ywelsch ywelsch self-requested a review January 18, 2022 11:10
@romseygeek
Copy link
Contributor Author

I think this is a bug in the query parsing code, or at least an inconsistency. According to #25726, we expand fields in the following ways:

  • If the field expansion is a partial wildcard, eg foo* then we only expand to searchable fields
  • If the field expansion is total, ie *, then we only expand to searchable fields AND we exclude metadata fields.

But what actually happens seems to happen for query_string:

  • If the field expansion is partial then we exclude metadata fields but don't filter out anything else
  • If the field expansion is total then we filter out non-searchable fields and metadata fields

In other words, partial field expansions are excluding the wrong thing. On simple_query_string we do what the original issue intended.

I'll change the doc-values (and runtime fields) exclusion to work on *, which I think is the original intent. But we should probably fix query_string separately.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one comment. While I understand what the intent behind the current logic is, I find it highly confusing how it is currently represented / named.

@@ -132,6 +132,10 @@ private QueryParserHelper() {}
// Ignore metadata fields
continue;
}
if (acceptMetadataField == false && fieldType.isIndexed() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acceptMetadataField variable name is highly confusing if we're using it here to denote full vs partial wilcard. The javadocs for acceptMetadataField say "Whether metadata fields should be added when a pattern is expanded.". Non-indexed fields shouldn't be added though, independently of whether they're metadata fields or not? Which metadata fields exist that are not indexed? I find this code now highly confusing, as the relation between indexed and acceptMetadataField isn't clear, at least with how things are currently named.

@jpountz
Copy link
Contributor

jpountz commented Jan 18, 2022

Do I understand correctly that a doc-value-only field would be excluded from * but not from *_foo? I wonder if it would be confusing to users.

@romseygeek
Copy link
Contributor Author

romseygeek commented Jan 19, 2022

Do I understand correctly that a doc-value-only field would be excluded from * but not from *_foo? I wonder if it would be confusing to users.

Yes, and I agree that it's not ideal, but there's precedent in that this is what we already do for metadata fields. I'm mainly concerned that after #82409 kibana searches may get much slower because all of sudden we're doing a table scan for every query

@jpountz
Copy link
Contributor

jpountz commented Jan 19, 2022

Thinking out loud, I wonder if there are other ways how we could do this, such as using a placeholder like _default that consists of either index.query.default_field if set, or the list of fields that are reasonable to use in a query with the approach that your PR is suggesting, and keeping the current semantics of *.

@romseygeek
Copy link
Contributor Author

Thinking out loud, I wonder if there are other ways how we could do this, such as using a placeholder like _default that consists of either index.query.default_field if set, or the list of fields that are reasonable to use in a query with the approach that your PR is suggesting, and keeping the current semantics of *.

So if you explicitly set * as your field then you get everything, but if you don't set anything then we fall back to index.query.default_field, and if that isn't set then we fall back to the reduced list?

@jpountz
Copy link
Contributor

jpountz commented Jan 20, 2022

Oh yes, something like that sounds nice!

@mark-vieira mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@javanna
Copy link
Member

javanna commented Jul 4, 2022

Closing, we decided not to proceed with this change. See a summary of the discussion at #82710

@javanna javanna closed this Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exclude doc-values-only fields from '*' field expansion queries
9 participants