-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDF support #311
Comments
Hi! Thanks for opening this issue! This is something I've been thinking about for a while and would be great to support some day. However, scanning the GDF in the same way DuckDB does with regular dataframes "zero copy" might be difficult as (as far as I know) geopandas geometries store pointers into GEOS geometries in memory. The problem is that DuckDB bundles its own version of GEOS, so while we technically could just pass the pointers around there's no guarantee that GEOS represents the geometries the same way across multiple versions or have stable ABI. So there would have to be a conversion step, most likely to/from WKB (or maybe geoarrow, not sure if thats natively supported in geopandas now), similar to the workaround I imagine you got going. We could definitely look at "hiding" this by adding support for doing the conversion automatically to duckdbs BLOB type (which spatial then can ingest) inside the duckdb python client at some point in the future. |
I think it would be great if we could make this conversion modular: DuckDB to Arrow to GeoPandas and vice versa instead of custom-built DuckDB to GeoPandas. Especially if we're able to reuse an import version of #153. It's fine in the near term to still serialize through WKB and just attach the There's discussion towards implementing native interop between GeoPandas and GeoArrow in geopandas/geopandas#3156. That will probably get implemented in the next few months but maybe after GeoPandas 1.0. Using GeoArrow natively inside GeoPandas will take longer.
Indeed, I don't believe GEOS objects are ABI stable, so you can't reliably share memory between Shapely and DuckDB spatial anyways. FWIW I'm nearly done with an integration the opposite direction; from DuckDB to GeoArrow/GeoPandas in Python. But having the default |
@kylebarron Thats sounds great! Id like to revisit #153 soon, even if support for "additional custom type metadata" is quite a bit away. |
In case anyone is visiting this trying to get a GDF into duckdb, I've found the following works:
Then if you want to cast back into a GDF:
|
I know duckdb works well with Pandas DataFrames but I hope for the ability to write SQL the same way on GDFs. As of now, when I write SQL on GDFs it returns this error
NotImplementedException: Not implemented Error: Data type 'geometry' not recognized
This is with spatial installed and loaded. There is a workaround, but it's kind of wonky. Thank you for your time.
The text was updated successfully, but these errors were encountered: