Support st_contains using H3 index#8498
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8498 +/- ##
============================================
+ Coverage 63.60% 68.79% +5.18%
- Complexity 4322 4326 +4
============================================
Files 1648 1694 +46
Lines 86917 88960 +2043
Branches 13265 13497 +232
============================================
+ Hits 55281 61197 +5916
+ Misses 27595 23431 -4164
- Partials 4041 4332 +291
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/H3Utils.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
shall we add some tests of the corner cases, such as on the line/vertex of the polygon?
There was a problem hiding this comment.
The ST_Contains seems doesn't include the point or line lying in polygon boundary. https://postgis.net/docs/ST_Contains.html
but let me try create a test with points very close to the boundary.
There was a problem hiding this comment.
I think this is just approximation, but it doesn't return same result of ST_Contains, right?
There was a problem hiding this comment.
i think
expressionFilterOperator.getNextBlock().getBlockDocIdSet().iterator();
MutableRoaringBitmap result = docIdIterator.applyAnd(partialMatchDocIds);
will do the final check on the potential match set.
There was a problem hiding this comment.
okay, it could be further optimized to skip the cells fully contained in the polygon. But we can leave it for futture (add a TODO) if you find it too difficult.
There was a problem hiding this comment.
sure. will create a TODO.
There was a problem hiding this comment.
actually I did the optimization. but still will use the issue to track if H3 support different polyfill natively.
|
High-level comment is that using H3 index can speed up query evaluation, but we need to ensure it does not change query results. H3's Polyfill returns a set of H3 for approximation, but we need one more round of evaluation to check the borderlines. |
2717ee1 to
d8f86c4
Compare
yupeng9
left a comment
There was a problem hiding this comment.
LGTM. Thanks for adding the tests and glad to see the improved index handling
There was a problem hiding this comment.
throw some meaningful error message for better debugging?
There was a problem hiding this comment.
sure. add error message for the assert.
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
dc7b903 to
15e03bb
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
LGTM with minor comments
There was a problem hiding this comment.
(minor) No need to check st_within and st_contains because the function name is already canonicalized (we keep 2 function names for st_distance for backward compatibility). We may directly use the value equals check into the canApplyH3IndexForInclusionCheck() instead of using a set for 2 values.
There was a problem hiding this comment.
(minor) columnName will never be null
| return columnName != null && _indexSegment.getDataSource(columnName).getH3Index() != null; | |
| return _indexSegment.getDataSource(columnName).getH3Index() != null; |
There was a problem hiding this comment.
update on line 183 as well.
There was a problem hiding this comment.
(minor) We don't usually put final for local variables
There was a problem hiding this comment.
Use iterator (LongIterator.nextLong()) instead to avoid the unnecessary boxing/unboxing, same for other places when iterating over LongSet
There was a problem hiding this comment.
(minor)
| .collect(Collectors.toList()), ImmutableList.of(), resolution)); | |
| .collect(Collectors.toList()), Collections.emptyList(), resolution)); |
f9bba02 to
3dcf59b
Compare
3dcf59b to
094d2f1
Compare
This is a follow up PR for : #7252 since that one seems doesn't get update for long time.
The idea is to convert input geometry into a list of h3 cells by using polyfill. But h3 polyfill only fills with the hexagons whose centers are contained by the geometry.
so creating a method
coverGeometryInH3to return the set of H3 cells at the specified resolution which completely cover the input shape.Instructions:
feature(**) Use
release-noteslabel for scenarios like: