Skip to content

Representation of spatial types on export to ArrowArrayStream #153

Open
@paleolimbot

Description

Currently when we export a spatial column to Arrow, we get:

import duckdb
duckdb.sql("LOAD spatial;")
duckdb.sql("SELECT ST_GeomFromText('POINT (0 1)') as geom").to_arrow_table()
#> pyarrow.Table
#> geom: binary
#> ----
#> geom: [[000020000000000000000000010000000000000000000000000000000000F03F]]

With geoarrow-python we can register extension types to propagate CRS and type name through pyarrow machinery:

import geoarrow.pyarrow as ga
array = ga.with_crs(ga.as_wkb(["POINT (0 1)"]), '{<some projjson>}')
array
#> GeometryExtensionArray:WkbType(geoarrow.wkb <{<some projjson>}>)[1]
#> <POINT (0 1)>
array.type.extension_name
#> 'geoarrow.wkb'
array.type.__arrow_ext_serialize__()
#> b'{"crs":"{<some projjson>}"}'

Would it be appropriate to export geometry columns to GeoArrow extension arrays (i.e., with ARROW:extension:name and ARROW:extension:type set according to https://github.com/geoarrow/geoarrow/blob/main/extension-types.md ? It looks like what is exported by default is your internal binary representation (perhaps not for the "native" unserialized types), but I think there are other types that are reencoded on export to Arrow (e.g., boolean columns are bitpacked).

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions