Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat: Add Geometry & Geography Types #2859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Uh oh!
There was an error while loading. Please reload this page.
feat: Add Geometry & Geography Types #2859
Changes from all commits
9b36337a7ac0d40d1d7a1f4359b4File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing
RFC: Iceberg v3 Geospatial Primitive Types
Motivation
Apache Iceberg v3 introduces native geospatial types (
geometryandgeography) to support spatial data workloads. These types enable:This RFC describes the design and implementation of these types in PyIceberg.
Scope
In scope:
geometry(C)andgeography(C, A)primitive type definitionsOut of scope (future work):
Non-Goals
Design
Type Parameters
GeometryType:
crs(string): Coordinate Reference System, defaults to"OGC:CRS84"GeographyType:
crs(string): Coordinate Reference System, defaults to"OGC:CRS84"algorithm(string): Geographic algorithm, defaults to"spherical"Type String Format
Runtime Representation
Values are stored as WKB (Well-Known Binary) bytes at runtime. This matches the Avro and Parquet physical representation per the Iceberg spec.
JSON Single-Value Serialization
Per the Iceberg spec, geometry/geography values should be serialized as WKT (Well-Known Text) strings in JSON. However, since we represent values as WKB bytes at runtime, conversion between WKB and WKT would require external dependencies.
Current behavior:
NotImplementedErroris raised for JSON serialization/deserialization until a conversion strategy is established.Avro Mapping
Both geometry and geography types map to Avro
bytestype, consistent withBinaryTypehandling.PyArrow/Parquet Mapping
With geoarrow-pyarrow installed:
geoarrow.pyarrow.wkb().with_crs()and.with_edge_type()for full GeoArrow compatibilityWithout geoarrow-pyarrow:
pa.large_binary()Compatibility
Format Version
Geometry and geography types require Iceberg format version 3. Attempting to use them with format version 1 or 2 will raise a validation error via
Schema.check_format_version_compatibility().geoarrow-pyarrow
pip install pyiceberg[geoarrow]Breaking Changes
None. These are new types that do not affect existing functionality.
Dependency/Versioning
Required:
Optional for full functionality:
Testing Strategy
Unit tests (
test_types.py):__str__and__repr__methodsminimum_format_version()enforcementIntegration tests (future):
Known Limitations
NotImplementedErrorFile Locations
pyiceberg/types.pypyiceberg/conversions.pypyiceberg/schema.pypyiceberg/utils/schema_conversion.pypyiceberg/io/pyarrow.pytests/test_types.pyReferences
Uh oh!
There was an error while loading. Please reload this page.