Skip to content

Conversation

cbuescher
Copy link
Member

Currently parsing inner queries can return null which leads to unnecessary complicated
null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.

This new builder returns null when calling its toQuery() method, so at this point
we still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).

This PR goes agains the query-refactoring branch.

@MaineC
Copy link

MaineC commented Jun 29, 2015

+1 from my side to the idea and the amount of handling special cases that this PR removes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/parse/parser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think saying that it should not be allowed in the query DSL is a bit misleading, cause it is allowed and we parse it properly. I know what you mean though and why you wrote that, I need yet to come up with a better explanation for this...

@javanna
Copy link
Member

javanna commented Jun 30, 2015

I like this a lot, left a few comments

@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch from 455443a to 2dd8f7c Compare June 30, 2015 10:43
@cbuescher
Copy link
Member Author

@javanna addressed your comments, have a second look if you like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a bit confusing how we handle the boosting query: the parser validates that the json contains the "negative" and "positive" elements. That said they can be empty, which would map to the new empty query builder. For java api, default is the empty query builder and one is not required to set negative and positive, no validation happens. I think that we should either require setting those two queries in both parser and builder, or remove any validation around them.

For instance, I would go and remove the default value here, and remove validation from the parser around whether negative and positive are part of the query. Then add a check in validate method that positive and negative need to be non null. This way the behaviour stays the same as before when it comes to json (no element means no set, which means validation error), but also java api users have to remember to set the queries. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how having a null default might lead to needing null checks in doXContent etc. though, let's discuss this. I think we have the opposite problem in other queries, will comment below.

@cbuescher
Copy link
Member Author

I agree that inner queries that are currently mandatory in the query DSL and are checked for in the parser should also be enforced to be set in the builder. We can do so by making all those cases mandatory arguments in the constructors now that we introduce the EmptyQueryBuilder as a stand-in for empty inner queries. I think from the cases you mentioned, only BoostingQueryBuilder needs an additional 2-arg constructor with null-checks then.

@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch 2 times, most recently from a0eeb58 to d5386f1 Compare July 1, 2015 10:30
@cbuescher
Copy link
Member Author

@javanna Rebased and re-checked all the builders touched so far for breaking changes regarding inner queries. Other than the additional null-checks in constructors we're pretty consistend with the current behavior on master. Changed BoosingQueryBuilder, removing setters for inner queries and instead forcing them to be non-null in constructor, also adding this to the changes doc. Also extended handling of empty collection of inner clauses in Bool/And/OrQueryBuilder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (query.clauses().isEmpty()) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, didn't know this exists.

@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch from d5386f1 to 6eea9fd Compare July 1, 2015 20:58
@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch from 6eea9fd to 96ed280 Compare July 1, 2015 21:43
@cbuescher
Copy link
Member Author

Rebased onto the boost/queryName refactoring, had to adapt a few tests for that and also added junit tests that check constructors and setters throw exceptions where they check for null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few lines above we also have:

if (!hasClauses()) {
    return new MatchAllDocsQuery();
}

that if is against our builder clauses, for the java api I guess, while this one is for resulting lucene clauses, in case all of the parsed clauses were empty_query? Wonder if we can have a single if like that, maybe the last one alone would do it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to return as soon as possible but seeing that otherwise we only loop over empty collection, that should not waste much cycles. The creation of the BoolQuery would be unnecessary, but I guess if it makes the code easier to read I can change that.

@javanna
Copy link
Member

javanna commented Jul 2, 2015

looks good, left a few comments, actually more questions than comments.

@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch from f80cd35 to 544e789 Compare July 2, 2015 13:48
@cbuescher
Copy link
Member Author

Removed early return when creating BooleanQuery in BoolQueryBuilder and updated a few comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other question, sorry! but I still find this confusing... previously we had null here for two situations:

  1. empty filter/query
  2. queries that after getting parsed would become a null query (their parse method returns null)

the first case falls now under the EmptyQueryBuilder case, which we handle here. But there can still be queries whose toQuery returns null, which need to be ignored, but that we know only in the toQuery, but we still set the minimumShouldMatch only based on case 1). I think setting the minimumShouldMatch like should happen later in the toQuery, which kinda makes sense, it needs to be done only on the resulting lucene query, not against the query builder.

So we can potentially add empty query here to the clauses as long as we move this check for null should clauses only in the toQuery method, if that makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still adding the EmptyQueryBuilder here in any case (before the if). The issue here is to set the minimumShouldMatch to at least 1, if there is no other value supplied in the JSON and we are parsing this inside another filter. This will have to happen only once for the first "real" (meaning non-null) inner query, so we can safely skip EmptyQuery here. The only thing that might go wrong here is that we set minimumShouldMatch to eagerly for any query != EmptyQueryBuilder but that for some reason will later be resolved to null in toQuery(). I'm not too sure about how minimumShouldMatch works in this case, we will have to check.

Another thing that struck me just now when thinking about it: Not sure if we also need to handle this case in the toQuery() itself, because query doesn't need to be parsed always in the future. The problem here is that at least for now we are using the isFilter() state of the QueryParseContex, but we dont have this yet in toQuery(), so there currently we don't know if the current query is used as a filter or not.

Both those issues are not related to the introducing of EmptyQueryBuilder but with the fact that the decison to set minimumShouldMatch (if not set explicitely) is based on the presence of an actual lucene query (in pre-refactoring code), which really we can only know in the query-generation phase (aka toQuery).

Since the whole conditional was moved over from the former BoolQueryFilter, it might be a good thing to ask if there is another way to solve this and why doing this only in filter context is necessary. If we could do it also for non-filter context it would be easy to move this to toQuery

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lets get this PR in and address this problem separately, I think we overlooked this when migrating the bool query. This check should be moved to the toQuery method. We should start looking into dividing the context into a query parse context and a to query context I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I leave the check in place for now? Can't really move it to toQuery as long as we have no filter context there. If I leave it, where would you suggest we track this? I'd prefer a separate issue because it's somewhat unrelated to this but complicated enough to leave some of the writup I did above there so we don't forget. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I meant leave as-is and open new issue :)

@javanna
Copy link
Member

javanna commented Jul 2, 2015

thanks @cbuescher left another comment :)

@javanna
Copy link
Member

javanna commented Jul 3, 2015

LGTM we have to fix the bool query and probably move some logic around isFilter to the toQuery method, but let's do it in a separate issue.

…ns an query builder

Currently parsing inner queries can return `null` which leads to unnecessary complicated
null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.

This new builder returns `null` when calling its `toQuery()` method, so at this point
we still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).

Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder.
Removed setters for positive/negatice query in favour of constructor with
mandatory non-null arguments in BoostingQueryBuilder.

Closes elastic#11915
@cbuescher cbuescher force-pushed the feature/query-refactoring-emptyquerybuilder branch from 544e789 to c51cfbb Compare July 3, 2015 07:52
@cbuescher
Copy link
Member Author

Thanks, will merge after rebase & squash and open separate issue for the BoolQueryBuilder problem mentioned above.

cbuescher added a commit that referenced this pull request Jul 3, 2015
…ptyquerybuilder

Query Refactoring: Make parsing nested queries always return query builder
@cbuescher cbuescher merged commit faa4e98 into elastic:feature/query-refactoring Jul 3, 2015
@kevinkluge kevinkluge removed the review label Jul 3, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
…ns an query builder

Currently parsing inner queries can return `null` which leads to unnecessary complicated
null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.

This new builder returns `null` when calling its `toQuery()` method, so at this point
we still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).

Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder.
Removed setters for positive/negatice query in favour of constructor with
mandatory non-null arguments in BoostingQueryBuilder.

Closes elastic#11915
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
…ring-emptyquerybuilder

Query Refactoring: Make parsing nested queries always return query builder
@cbuescher cbuescher deleted the feature/query-refactoring-emptyquerybuilder branch March 11, 2016 11:51
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query Refactoring labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants