PARQUET-2417: Add statistics support to geometry logical type#2971
PARQUET-2417: Add statistics support to geometry logical type#2971wgtmac merged 101 commits intoapache:masterfrom zhangfengcdt:feature-apache-parquet-2417-geospatial
Conversation
This PR is copied form this place: #1379
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
...uet-column/src/main/java/org/apache/parquet/column/statistics/geometry/EnvelopeCovering.java
Outdated
Show resolved
Hide resolved
...uet-column/src/main/java/org/apache/parquet/column/statistics/geometry/EnvelopeCovering.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/BinaryStatistics.java
Outdated
Show resolved
Hide resolved
…e spherical edge is specified.
…apache-parquet-2417-geospatial
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for the update! I have left some comments. I think we are reaching the finish line!
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeometryUtils.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/Covering.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestGeometryTypeRoundTrip.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testEPSG4326BasicReadWriteGeometryValue() throws Exception { |
There was a problem hiding this comment.
Thanks for adding these tests!
I think we are missing tests in following cases:
- verify geometry type metadata is well preserved.
- verify all kinds of geometry stats are preserved, including bbox, covering and geometry types.
- verify geo stats in the column index have been generated.
I can do these later.
|
Thanks! I will take a look after back from vacation. |
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
- add valid flag to BoundingBox impl - check geostatistics validity when writing to parquet - add tests for mix of valid and invalid geometry update
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
Tested the following cases after the changes: - Merging valid stats with null stats results in invalid stats with null components - Merging null stats with valid stats also results in invalid stats with null components - Merging valid stats with partially null stats (null bounding box) results in the expected behavior where only the bounding box becomes null
|
@wgtmac can you review again? Thank you! |
parquet-column/src/main/java/org/apache/parquet/column/page/PageWriter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/page/PageWriter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
…ry, omit it from stats"
|
@wgtmac Thank you for the very helpful review comments! I believe I’ve addressed them all and have also added and updated some tests. Could you please take another look and let me know if I missed anything or if further changes are needed? Thanks again! |
wgtmac
left a comment
There was a problem hiding this comment.
Thanks for the quick update! Now it generally looks great.
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-column/src/test/java/org/apache/parquet/column/statistics/geometry/TestBoundingBox.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/test/java/org/apache/parquet/column/statistics/geometry/TestBoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialTypes.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialTypes.java
Outdated
Show resolved
Hide resolved
Thanks for the review! The fixes should be all in now. |
...lumn/src/main/java/org/apache/parquet/column/statistics/geospatial/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
|
I just merged it. Thanks @zhangfengcdt for working on this and everyone for the review! |
This PR implements the geo types that are introduced in the new parquet specification.
apache/parquet-format@94b9d63
Jira
Tests
Commits
Documentation