Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDF support #311

Open
kentstephen opened this issue Apr 25, 2024 · 4 comments
Open

GDF support #311

kentstephen opened this issue Apr 25, 2024 · 4 comments

Comments

@kentstephen
Copy link

I know duckdb works well with Pandas DataFrames but I hope for the ability to write SQL the same way on GDFs. As of now, when I write SQL on GDFs it returns this error
NotImplementedException: Not implemented Error: Data type 'geometry' not recognized
This is with spatial installed and loaded. There is a workaround, but it's kind of wonky. Thank you for your time.

@Maxxen
Copy link
Member

Maxxen commented Apr 25, 2024

Hi! Thanks for opening this issue!

This is something I've been thinking about for a while and would be great to support some day. However, scanning the GDF in the same way DuckDB does with regular dataframes "zero copy" might be difficult as (as far as I know) geopandas geometries store pointers into GEOS geometries in memory. The problem is that DuckDB bundles its own version of GEOS, so while we technically could just pass the pointers around there's no guarantee that GEOS represents the geometries the same way across multiple versions or have stable ABI.

So there would have to be a conversion step, most likely to/from WKB (or maybe geoarrow, not sure if thats natively supported in geopandas now), similar to the workaround I imagine you got going. We could definitely look at "hiding" this by adding support for doing the conversion automatically to duckdbs BLOB type (which spatial then can ingest) inside the duckdb python client at some point in the future.

@kylebarron
Copy link

kylebarron commented Apr 26, 2024

I think it would be great if we could make this conversion modular: DuckDB to Arrow to GeoPandas and vice versa instead of custom-built DuckDB to GeoPandas. Especially if we're able to reuse an import version of #153. It's fine in the near term to still serialize through WKB and just attach the geoarrow.wkb metadata onto the column.

There's discussion towards implementing native interop between GeoPandas and GeoArrow in geopandas/geopandas#3156. That will probably get implemented in the next few months but maybe after GeoPandas 1.0. Using GeoArrow natively inside GeoPandas will take longer.

geopandas geometries store pointers into GEOS geometries in memory

Indeed, I don't believe GEOS objects are ABI stable, so you can't reliably share memory between Shapely and DuckDB spatial anyways.

FWIW I'm nearly done with an integration the opposite direction; from DuckDB to GeoArrow/GeoPandas in Python. But having the default .arrow() expose the unstable GEOMETRY type makes things a bit harder.

@Maxxen
Copy link
Member

Maxxen commented Apr 26, 2024

@kylebarron Thats sounds great! Id like to revisit #153 soon, even if support for "additional custom type metadata" is quite a bit away.

@publicmatt
Copy link

In case anyone is visiting this trying to get a GDF into duckdb, I've found the following works:

gdf['geometry'] = gdf['geometry'].to_wkt()

out = duckdb.sql("""
  SELECT
   *
  FROM gdf
""").df()

Then if you want to cast back into a GDF:

gdf['geometry'] = gpd.GeoSeries.from_wkt(gdf['geometry'])
gdf = gpd.GeoDataFrame(gdf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants