Representation of spatial types on export to ArrowArrayStream
#153
Open
Description
Currently when we export a spatial column to Arrow, we get:
import duckdb
duckdb.sql("LOAD spatial;")
duckdb.sql("SELECT ST_GeomFromText('POINT (0 1)') as geom").to_arrow_table()
#> pyarrow.Table
#> geom: binary
#> ----
#> geom: [[000020000000000000000000010000000000000000000000000000000000F03F]]
With geoarrow-python we can register extension types to propagate CRS and type name through pyarrow machinery:
import geoarrow.pyarrow as ga
array = ga.with_crs(ga.as_wkb(["POINT (0 1)"]), '{<some projjson>}')
array
#> GeometryExtensionArray:WkbType(geoarrow.wkb <{<some projjson>}>)[1]
#> <POINT (0 1)>
array.type.extension_name
#> 'geoarrow.wkb'
array.type.__arrow_ext_serialize__()
#> b'{"crs":"{<some projjson>}"}'
Would it be appropriate to export geometry columns to GeoArrow extension arrays (i.e., with ARROW:extension:name
and ARROW:extension:type
set according to https://github.com/geoarrow/geoarrow/blob/main/extension-types.md ? It looks like what is exported by default is your internal binary representation (perhaps not for the "native" unserialized types), but I think there are other types that are reencoded on export to Arrow (e.g., boolean columns are bitpacked).