-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Implement IndexOrDocValuesQuery for geo_shape field #64688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Geo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the Component2DVisitor logic, but the way the query gets built makes sense to me.
...rc/test/java/org/elasticsearch/xpack/spatial/index/query/LatLonShapeDocValuesQueryTests.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/elasticsearch/xpack/spatial/index/query/LatLonShapeDocValuesQueryTests.java
Outdated
Show resolved
Hide resolved
...ial/src/main/java/org/elasticsearch/xpack/spatial/index/query/LatLonShapeDocValuesQuery.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be helpful to have both index and doc-value queries execute against the same tests instead of just having doc-value queries be considered correct if they behave like index-queries? Since theoretically the counts may be the same, but maybe they are including different documents? highly unlikely, but just a thought.
return doPush(relation); | ||
} | ||
|
||
abstract boolean doPush(PointValues.Relation relation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is an extension to the push
that is documented in TriangleTreeReader.Visitor
, I think it would be convenient to re-write in the documentation what the return value of this function signals
|
||
@Override | ||
public void reset() { | ||
answer = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to explain the rational behind each of these default answer
values (here as well as the other visitors), especially since they are not the same for different Visitors. It seems like the default value is assumed in some of the logic
Thanks @talevy I updated the test so we actually check the hits are the same. I improved the test as well which surface and issue with Contains queries and geometry collections. I added more documentation to Component2DVisitor, let me know if that's ok. |
We upgraded Elasticsearch to a Lucene 8.8 snapshot so this PR is not blocked any more. Could you have another look @talevy ? |
|
||
@Override | ||
public float matchCost() { | ||
return 1000f; // TODO: what should it be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too familiar with this. So long as matchCost
is constant and the ConstantScoreScorer is used, this number is arbitrary and has no effect, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this value is used when you have more that one TwoPhaseIterator
in a conjunction in order to sort them. You want to execute the cheaper iterators first.
In put case we do not have a good estimation of cost so it is good enough to return a big number. Note that this is done the same for LatLonPointDocValues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending resolution
IndexOrDocValuesQuery is an optimisation for queries when there are in a filter with another query that is more restrictive. In that case, it might be faster to just check the matches for that query directly with the doc values instead of traversing the BKD index.
With the introduction of doc values for geo_shape, it is possible now to implement such query for this type of data.