Skip to content

python/sedonadb: Add pyogrio integration for read/write support for GDAL/OGR #137

@paleolimbot

Description

@paleolimbot

Currently our non-Parquet IO depends on GeoPandas or DuckDB. These are great workarounds but they don't leverage the generic pushdown/pruning capability that DataFusion gives us.

While we could hook directly into GDAL, building, linking, and packaging GDAL isn't something I'd like to do specifically for vector support if there are any alternatives. It's possible that specifically for OGR support we might be able to do something similar to our PROJ support (dynamically pull symbols), but that won't scale beyond a very limited set of operations.

Since GDAL 3.6, an ArrowArrayStream interface has been provided for reading OGR layers. Our DataFrames support arbitrary ArrowArrayStream input, although we would need to modify it to be more resiliant to multiple collects (e.g., today if you try to create a data frame, .show() it twice, it will fail the second time because the array stream has already been pulled). Also, we need to wire in pushdown support which GDAL does very well (e.g., using embedded shapefile/fgb/gpkg spatial index).

Proof of concept:

import pyogrio.raw
import sedona.db

sd = sedona.db.connect()

url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb"
with pyogrio.raw.ogr_open_arrow(f"/vsicurl/{url}", {}) as info:
    meta, reader = info
    print(meta)
    df = sd.create_data_frame(reader).to_memtable()

df.show(5)
#> {'crs': 'EPSG:4326', 'encoding': 'UTF-8', 'fields': array(['name'], dtype=object), 'geometry_type': 'Point', 'geometry_name': '', 'fid_column': 'OGC_FID'}
#> ┌──────────────┬───────────────────────────────┐
#> │     name     ┆          wkb_geometry         │
#> │     utf8     ┆            geometry           │
#> ╞══════════════╪═══════════════════════════════╡
#> │ Vatican City ┆ POINT(12.4533865 41.9032822)  │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ San Marino   ┆ POINT(12.4417702 43.9360958)  │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Vaduz        ┆ POINT(9.5166695 47.1337238)   │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Lobamba      ┆ POINT(31.1999971 -26.4666675) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ Luxembourg   ┆ POINT(6.1300028 49.6116604)   │
#> └──────────────┴───────────────────────────────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions