-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Query Refactoring: Make parsing nested queries always return query builder #11915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Refactoring: Make parsing nested queries always return query builder #11915
Conversation
+1 from my side to the idea and the amount of handling special cases that this PR removes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/parse/parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think saying that it should not be allowed in the query DSL is a bit misleading, cause it is allowed and we parse it properly. I know what you mean though and why you wrote that, I need yet to come up with a better explanation for this...
I like this a lot, left a few comments |
455443a
to
2dd8f7c
Compare
@javanna addressed your comments, have a second look if you like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find a bit confusing how we handle the boosting query: the parser validates that the json contains the "negative" and "positive" elements. That said they can be empty, which would map to the new empty query builder. For java api, default is the empty query builder and one is not required to set negative and positive, no validation happens. I think that we should either require setting those two queries in both parser and builder, or remove any validation around them.
For instance, I would go and remove the default value here, and remove validation from the parser around whether negative and positive are part of the query. Then add a check in validate method that positive and negative need to be non null. This way the behaviour stays the same as before when it comes to json (no element means no set, which means validation error), but also java api users have to remember to set the queries. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how having a null default might lead to needing null checks in doXContent etc. though, let's discuss this. I think we have the opposite problem in other queries, will comment below.
I agree that inner queries that are currently mandatory in the query DSL and are checked for in the parser should also be enforced to be set in the builder. We can do so by making all those cases mandatory arguments in the constructors now that we introduce the EmptyQueryBuilder as a stand-in for empty inner queries. I think from the cases you mentioned, only BoostingQueryBuilder needs an additional 2-arg constructor with null-checks then. |
a0eeb58
to
d5386f1
Compare
@javanna Rebased and re-checked all the builders touched so far for breaking changes regarding inner queries. Other than the additional null-checks in constructors we're pretty consistend with the current behavior on master. Changed BoosingQueryBuilder, removing setters for inner queries and instead forcing them to be non-null in constructor, also adding this to the changes doc. Also extended handling of empty collection of inner clauses in Bool/And/OrQueryBuilder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (query.clauses().isEmpty()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, didn't know this exists.
d5386f1
to
6eea9fd
Compare
6eea9fd
to
96ed280
Compare
Rebased onto the boost/queryName refactoring, had to adapt a few tests for that and also added junit tests that check constructors and setters throw exceptions where they check for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few lines above we also have:
if (!hasClauses()) {
return new MatchAllDocsQuery();
}
that if is against our builder clauses, for the java api I guess, while this one is for resulting lucene clauses, in case all of the parsed clauses were empty_query? Wonder if we can have a single if like that, maybe the last one alone would do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to return as soon as possible but seeing that otherwise we only loop over empty collection, that should not waste much cycles. The creation of the BoolQuery would be unnecessary, but I guess if it makes the code easier to read I can change that.
looks good, left a few comments, actually more questions than comments. |
f80cd35
to
544e789
Compare
Removed early return when creating BooleanQuery in BoolQueryBuilder and updated a few comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other question, sorry! but I still find this confusing... previously we had null here for two situations:
- empty filter/query
- queries that after getting parsed would become a null query (their parse method returns null)
the first case falls now under the EmptyQueryBuilder case, which we handle here. But there can still be queries whose toQuery returns null, which need to be ignored, but that we know only in the toQuery, but we still set the minimumShouldMatch only based on case 1). I think setting the minimumShouldMatch like should happen later in the toQuery, which kinda makes sense, it needs to be done only on the resulting lucene query, not against the query builder.
So we can potentially add empty query here to the clauses as long as we move this check for null should clauses only in the toQuery method, if that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still adding the EmptyQueryBuilder here in any case (before the if
). The issue here is to set the minimumShouldMatch to at least 1
, if there is no other value supplied in the JSON and we are parsing this inside another filter. This will have to happen only once for the first "real" (meaning non-null) inner query, so we can safely skip EmptyQuery here. The only thing that might go wrong here is that we set minimumShouldMatch
to eagerly for any query != EmptyQueryBuilder but that for some reason will later be resolved to null
in toQuery()
. I'm not too sure about how minimumShouldMatch
works in this case, we will have to check.
Another thing that struck me just now when thinking about it: Not sure if we also need to handle this case in the toQuery()
itself, because query doesn't need to be parsed always in the future. The problem here is that at least for now we are using the isFilter()
state of the QueryParseContex, but we dont have this yet in toQuery()
, so there currently we don't know if the current query is used as a filter or not.
Both those issues are not related to the introducing of EmptyQueryBuilder but with the fact that the decison to set minimumShouldMatch
(if not set explicitely) is based on the presence of an actual lucene query (in pre-refactoring code), which really we can only know in the query-generation phase (aka toQuery).
Since the whole conditional was moved over from the former BoolQueryFilter, it might be a good thing to ask if there is another way to solve this and why doing this only in filter context is necessary. If we could do it also for non-filter context it would be easy to move this to toQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok lets get this PR in and address this problem separately, I think we overlooked this when migrating the bool query. This check should be moved to the toQuery method. We should start looking into dividing the context into a query parse context and a to query context I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should I leave the check in place for now? Can't really move it to toQuery
as long as we have no filter context there. If I leave it, where would you suggest we track this? I'd prefer a separate issue because it's somewhat unrelated to this but complicated enough to leave some of the writup I did above there so we don't forget. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I meant leave as-is and open new issue :)
thanks @cbuescher left another comment :) |
LGTM we have to fix the bool query and probably move some logic around isFilter to the toQuery method, but let's do it in a separate issue. |
…ns an query builder Currently parsing inner queries can return `null` which leads to unnecessary complicated null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder that can be used to signal that this query clause should be ignored upstream where possible, we can avoid additional null checks in parent query builders and still allow for this query to be ignored when creating the lucene query later. This new builder returns `null` when calling its `toQuery()` method, so at this point we still need to handle it, but having an explicit object for the intermediate query representation makes it easier to differentiate between cases where inner query clauses are empty (legal) or are not set at all (illegal). Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder. Removed setters for positive/negatice query in favour of constructor with mandatory non-null arguments in BoostingQueryBuilder. Closes elastic#11915
544e789
to
c51cfbb
Compare
Thanks, will merge after rebase & squash and open separate issue for the BoolQueryBuilder problem mentioned above. |
…ptyquerybuilder Query Refactoring: Make parsing nested queries always return query builder
…ns an query builder Currently parsing inner queries can return `null` which leads to unnecessary complicated null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder that can be used to signal that this query clause should be ignored upstream where possible, we can avoid additional null checks in parent query builders and still allow for this query to be ignored when creating the lucene query later. This new builder returns `null` when calling its `toQuery()` method, so at this point we still need to handle it, but having an explicit object for the intermediate query representation makes it easier to differentiate between cases where inner query clauses are empty (legal) or are not set at all (illegal). Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder. Removed setters for positive/negatice query in favour of constructor with mandatory non-null arguments in BoostingQueryBuilder. Closes elastic#11915
…ring-emptyquerybuilder Query Refactoring: Make parsing nested queries always return query builder
Currently parsing inner queries can return
null
which leads to unnecessary complicatednull checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.
This new builder returns
null
when calling itstoQuery()
method, so at this pointwe still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).
This PR goes agains the query-refactoring branch.