-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometry extension with GeoArrow (non-binary) geometry #703
Comments
I think I was vaguely responsible for pushing the initial bit towards BINARY in review, which I did because the number of types explodes if you give each geometry_type/dimension its own type. Because many (most, all?) implementations right now have a "geometry" type where each element could be any geometry type/dimension, we would still need those type/function definitions to exist. It would be very cool to define the geometry type and/or dimension-specific types and overloads of the functions that give a more precise output type (for example, |
The "structure" field in Substrait is (to the best of my knowledge) not used much yet. Types in Substrait are a little different than types in other libraries because Substrait doesn't really need to know how to create arrays. In fact, Substrait really only cares about types with regards to functions "what functions are allowed for a given type?" Some examples:
In practice, there is one other spot where we need to know some details about the type, and that is when we need to create a type literal (e.g. if we want to represent the plan So, if you have a better suggestion, go ahead and change it. I'd encourage anything other than
I'd recommend reviewing the available functions and making sure you agree with them. Is it common to include geospatial literals in queries? If so, we might want to canonicalize a message that can be used as a geospatial literal. I'm not enitrely sure where it would go but would encourage you to share some protobuf you come up with and we can find a spot.
User defined types should be parameterizable if that is what you are after. I'm not entirely sure I understand though. |
Thanks! This is really insightful background.
So if I understand correctly, this
I figure that the substrait functions should more or less mirror PostGIS functions (built on GEOS). At a glance the existing substrait functions seem reasonable.
I think it's pretty common. Say you want to find the distance of all geometries from a specific point, then you want to include that point in your input query. I haven't worked with protobuf before. I figure there's some way to represent something GeoJSON-like in protobuf (nested lists of variable-size-list coordinates).
That would be really cool. Sounds like maybe a good deal of work to stabilize, but if these functions only have to be defined once and can be pretty stable, perhaps it's worth it. But for now we can work with generically-typed geometry objects.
I think we'd need some parameterization to declare whether a geometry is a point, LineString, Polygon, etc. And maybe some type-level parameterization about the Coordinate Reference System, but maybe that's a black hole 🕳️ . |
👋 I'm a core contributor to GeoArrow, but I'm new to substrait and just reading through the docs.
I was surprised to see that a geospatial extension was already started/added in #543, #552, #554, #556.
But the geometry type's
structure
is defined as binary, which is (seems) incompatible with GeoArrow's layout of nested lists/structs.substrait/extensions/functions_geometry.yaml
Lines 3 to 5 in 67f93b6
What's the right way to handle this? Some potential implementations like DuckDB/Postgres would indeed have an opaque binary blob, but newer GeoArrow-based implementations would not have a binary object. Should GeoArrow be creating its own substrait extension that build on GeoArrow data types? That would mean multiple substrait geospatial extensions that use binary blobs or native geometries as a base.
For reference, I'm quite interested in geospatial support in substrait, with a long term goal of building a geospatial substrait executor backed by DataFusion and geoarrow-rs once DataFusion gets user-defined type support apache/datafusion#11513.
cc @jorisvandenbossche, @paleolimbot as fellow GeoArrow contributors and since they commented in #543
The text was updated successfully, but these errors were encountered: