-
Notifications
You must be signed in to change notification settings - Fork 2.9k
WIP Parquet: Support reading/writing geometry and geography columns #12347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I found that it is not easy to upgrade the parquet dependency to the (not-released-yet) next version, because parquet-hadoop now uses a FileSystem API introduced in Hadoop 3: apache/parquet-java#3079. Upgrading parquet dependencies to the latest SNAPSHOT version results in the following failure when running tests in We have to remove Hadoop 2 support and migrate to Hadoop 3 for all submodules. There is a stale PR working on this: #10932. I found that #10940 was closed as completed but there are still lots of submodule depending on Hadoop 2. I'd like to know how should we proceed to upgrade the parquet package. Should we upgrade dependencies to Hadoop 2 to Hadoop 3 to unblock the parquet upgrade? @szehon-ho @rdblue |
|
I would raise this question on the dev list to get wider audience for the issue after collecting the modules effected. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
Hi, sorry @Kontinuation for the long delay, (Iceberg summit and internal stuff). I wonder if we can rebase based on #12346 and also only have the Parquet part (not the expression part). Also, is it cleaner now that Parquet Format 2.11.0 is released with Parquet geo logical type? |
Sure. I'll rework this patch and mark it ready for review once geospatial bounds and spatial predicates were added api/core. |
by the way, I suppose we should try to move away from Hadoop 2 for remaining submodules to move ahead, let's see where we still have issues. |
|
freamdx@929dfae is a simple solution |
20c391a to
ae97c63
Compare
|
Hi @Kontinuation Do we need this PR to use Geo types in Iceberg ? |
This PR is still far from getting Geo types working with actual query engines such as Spark. We also need to make changes to the query engine integration (e.g. iceberg-spark and iceberg-spark-extensions) to make it actually usable. |
|
@Kontinuation If i want to use Geo types with Apache Sedona for my iceberg table. What should I do ? I tried to create a table with geometry However it throw exception. spark-sql (default)> CREATE TABLE LOCAL.db.icetable (id string, geometry geometry)
> USING iceberg
> TBLPROPERTIES('format-version'='3');
[UNSUPPORTED_DATATYPE] Unsupported data type "GEOMETRY". SQLSTATE: 0A000
== SQL (line 1, position 53) ==
...b.icetable (id string, geometry geometry)
^^^^^^^^Should I patch Apached Sedona with your this PR: apache/sedona#1830 |
5279eba to
b7ff2ec
Compare
This PR depends on #12667. It implements part of the iceberg geo spec: #10981.
The iceberg spec requires that geometry and geography types in iceberg are mapped to
BINARYphysical types withGEOMETRYorGEOGRAPHYlogical type annotations. These 2 spatial logical types were introduced to the Parquet format since apache/parquet-format#240, and the initial implementation of the spec has been merged into parquet-java: apache/parquet-java#2971 and apache/parquet-java#3200.The parquet-java implementation has not been released yet, so this work-in-progress PR depends on a locally built SNAPSHOT version of parquet-java. We'll mark it ready once we bump the parquet-java version to the new release and pass all the tests.