Skip to content

Implement IndexOrDocValuesQuery for geo_shape field #64688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 15, 2021

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Nov 6, 2020

IndexOrDocValuesQuery is an optimisation for queries when there are in a filter with another query that is more restrictive. In that case, it might be faster to just check the matches for that query directly with the doc values instead of traversing the BKD index.

With the introduction of doc values for geo_shape, it is possible now to implement such query for this type of data.

@iverase iverase added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes v8.0.0 v7.11.0 labels Nov 6, 2020
@iverase iverase requested review from jpountz and talevy November 6, 2020 08:52
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 6, 2020
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the Component2DVisitor logic, but the way the query gets built makes sense to me.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be helpful to have both index and doc-value queries execute against the same tests instead of just having doc-value queries be considered correct if they behave like index-queries? Since theoretically the counts may be the same, but maybe they are including different documents? highly unlikely, but just a thought.

return doPush(relation);
}

abstract boolean doPush(PointValues.Relation relation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is an extension to the push that is documented in TriangleTreeReader.Visitor, I think it would be convenient to re-write in the documentation what the return value of this function signals


@Override
public void reset() {
answer = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to explain the rational behind each of these default answer values (here as well as the other visitors), especially since they are not the same for different Visitors. It seems like the default value is assumed in some of the logic

@iverase
Copy link
Contributor Author

iverase commented Nov 12, 2020

Thanks @talevy I updated the test so we actually check the hits are the same. I improved the test as well which surface and issue with Contains queries and geometry collections.

I added more documentation to Component2DVisitor, let me know if that's ok.

@iverase iverase added v7.12.0 and removed v7.11.0 labels Dec 16, 2020
@iverase
Copy link
Contributor Author

iverase commented Jan 12, 2021

We upgraded Elasticsearch to a Lucene 8.8 snapshot so this PR is not blocked any more. Could you have another look @talevy ?


@Override
public float matchCost() {
return 1000f; // TODO: what should it be?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with this. So long as matchCost is constant and the ConstantScoreScorer is used, this number is arbitrary and has no effect, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this value is used when you have more that one TwoPhaseIterator in a conjunction in order to sort them. You want to execute the cheaper iterators first.

In put case we do not have a good estimation of cost so it is good enough to return a big number. Note that this is done the same for LatLonPointDocValues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending resolution

@iverase iverase merged commit ce7cb99 into elastic:master Jan 15, 2021
@iverase iverase deleted the IndexOrDocValues branch January 15, 2021 08:39
iverase added a commit that referenced this pull request Jan 15, 2021
Introduces a new query for geo_shape field that works on docValues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants