Description
Describe the usage question you have. Please include as many useful details as possible.
👋 I've been excited about the PyCapsule interface, and have been implementing it in my geoarrow-rust project. Every function call accepts any Arrow PyCapsule interface object, no matter its producer. It's really amazing!
Fundamentally, my question is whether the existence of methods on an object should allow for an inference of its storage type. That is, should it be possible to observe whether a producer object is chunked or not based on whether it exports __arrow_c_array__
or __arrow_c_stream__
? I had been expecting yes, as pyarrow implements only the former on Array
and RecordBatch
and only the latter on ChunkedArray
and Table
(to my knowledge). But this question came up here, where nanoarrow implements both __arrow_c_array__
and __arrow_c_stream__
I'd argue that it's simpler to only define a single type of export method on a class and allow the consumer to convert to a different representation if they need. This communicates more information about how the existing data is already stored in memory. But in general I think it's really useful if the community is able to agree on a convention here, which will inform whether consumers can expect this invariant to hold or not.
Component(s)
Python