Skip to content

[Python] Conventions around PyCapsule Interface and choosing Array/Stream export #40648

Open
@kylebarron

Description

@kylebarron

Describe the usage question you have. Please include as many useful details as possible.

👋 I've been excited about the PyCapsule interface, and have been implementing it in my geoarrow-rust project. Every function call accepts any Arrow PyCapsule interface object, no matter its producer. It's really amazing!

Fundamentally, my question is whether the existence of methods on an object should allow for an inference of its storage type. That is, should it be possible to observe whether a producer object is chunked or not based on whether it exports __arrow_c_array__ or __arrow_c_stream__? I had been expecting yes, as pyarrow implements only the former on Array and RecordBatch and only the latter on ChunkedArray and Table (to my knowledge). But this question came up here, where nanoarrow implements both __arrow_c_array__ and __arrow_c_stream__

I'd argue that it's simpler to only define a single type of export method on a class and allow the consumer to convert to a different representation if they need. This communicates more information about how the existing data is already stored in memory. But in general I think it's really useful if the community is able to agree on a convention here, which will inform whether consumers can expect this invariant to hold or not.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions