-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Currently our non-Parquet IO depends on GeoPandas or DuckDB. These are great workarounds but they don't leverage the generic pushdown/pruning capability that DataFusion gives us.
While we could hook directly into GDAL, building, linking, and packaging GDAL isn't something I'd like to do specifically for vector support if there are any alternatives. It's possible that specifically for OGR support we might be able to do something similar to our PROJ support (dynamically pull symbols), but that won't scale beyond a very limited set of operations.
Since GDAL 3.6, an ArrowArrayStream interface has been provided for reading OGR layers. Our DataFrames support arbitrary ArrowArrayStream input, although we would need to modify it to be more resiliant to multiple collects (e.g., today if you try to create a data frame, .show() it twice, it will fail the second time because the array stream has already been pulled). Also, we need to wire in pushdown support which GDAL does very well (e.g., using embedded shapefile/fgb/gpkg spatial index).
Proof of concept:
import pyogrio.raw
import sedona.db
sd = sedona.db.connect()
url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb"
with pyogrio.raw.ogr_open_arrow(f"/vsicurl/{url}", {}) as info:
meta, reader = info
print(meta)
df = sd.create_data_frame(reader).to_memtable()
df.show(5)
#> {'crs': 'EPSG:4326', 'encoding': 'UTF-8', 'fields': array(['name'], dtype=object), 'geometry_type': 'Point', 'geometry_name': '', 'fid_column': 'OGC_FID'}
#> ┌──────────────┬───────────────────────────────┐
#> │ name ┆ wkb_geometry │
#> │ utf8 ┆ geometry │
#> ╞══════════════╪═══════════════════════════════╡
#> │ Vatican City ┆ POINT(12.4533865 41.9032822) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ San Marino ┆ POINT(12.4417702 43.9360958) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Vaduz ┆ POINT(9.5166695 47.1337238) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Lobamba ┆ POINT(31.1999971 -26.4666675) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Luxembourg ┆ POINT(6.1300028 49.6116604) │
#> └──────────────┴───────────────────────────────┘