Add dunder method for Arrow C Data Interface to DataFrame and Column objects

The Python Arrow community is adding a public way to interchange data through the C Data Interface, using PyCapsule objects holding the C struct, similary as DLPack's python interface: http://crossbow.voltrondata.com/pr_docs/37797/format/CDataInterface/PyCapsuleInterface.html

We have DLPack support at the Buffer level, and similarly, I think it would be useful to add Arrow support at the DataFrame and Column level. 

Concretely, I would propose adding an optional `__arrow_c_schema__`, `__arrow_c_array__` and `__arrow_c_stream__` methods to both the `DataFrame` and `Column` interchange objects. Those methods would be optional, with their presence indicating that this specific implementation of the interchange object supports the Arrow interface.  
Consumers of the interchange protocol could then check for the presence of those methods, and try them first for an easier and faster conversion, and otherwise use the standard APIs through the Column and Buffer objects (example: pyarrow and polars interchanging data).

It might be a bit strange to add both the array and stream interface methods, but that is due to that the interchange protocol hasn't really made a distinction between a single chunk vs a chunked object (https://github.com/data-apis/dataframe-api/issues/250). But I think the array method could then raise an error if the DataFrame or Column still exists of more than 1 chunk.

This would address https://github.com/data-apis/dataframe-api/issues/48 but without being tied to a specific library implementation, but solely memory layout.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dunder method for Arrow C Data Interface to DataFrame and Column objects #279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add dunder method for Arrow C Data Interface to DataFrame and Column objects #279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions