You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is not currently possible to use arrow-rs's FFI to exchange something like an ArrayStream or ChunkedArray when those arrays do not represent RecordBatches. ffi_stream::ArrowArrayStreamReader will error if the data type of the stream is not Struct.
This makes it impossible in the general case to interop with a pyarrow.ChunkedArray or polars.Series (via Python).
The Arrow C Stream Interface does support non-struct array types. get_next() of ArrowArrayStream returns an ArrowArray, and an ArrowArray can be any generic Arrow array. That Arrow array is often a StructArray, with the understanding that the StructArray represents a RecordBatch, but it doesn't have to be.
you assume that the data type of the stream is struct (and also assume that you can interpret the C Schema as a Schema), but that isn't required by the spec. To be more generic, you can use the data type of the C Schema directly.
Describe the solution you'd like
Some way to transfer a stream of Array via FFI.
Describe alternatives you've considered
There's currently no way to exchange a stream of generic arrays with arrow-rs, as far as I can tell.
Additional context
For full disclosure, I've already implemented this in my own library, pyo3-arrow. I have an ArrayReader trait to parallel arrow::RecordBatchReader, and vendored a derived copy of ffi_stream.rs to make it possible to handle this interop (while not necessarily materializing the entire stream as a ChunkedArray.
I'm currently fine with my vendored copy of FFI, but others may have the same issue.
kylebarron
changed the title
Support reading Arrow C Stream interface that does not yield RecordBatch
Support Arrow C Stream interface containing stream of ArrayOct 18, 2024
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is not currently possible to use arrow-rs's FFI to exchange something like an
ArrayStream
orChunkedArray
when those arrays do not represent RecordBatches.ffi_stream::ArrowArrayStreamReader
will error if the data type of the stream is notStruct
.This makes it impossible in the general case to interop with a
pyarrow.ChunkedArray
orpolars.Series
(via Python).The Arrow C Stream Interface does support non-struct array types.
get_next()
ofArrowArrayStream
returns anArrowArray
, and anArrowArray
can be any generic Arrow array. That Arrow array is often a StructArray, with the understanding that the StructArray represents a RecordBatch, but it doesn't have to be.Here:
arrow-rs/arrow-array/src/ffi_stream.rs
Lines 364 to 367 in 5508978
you assume that the data type of the stream is struct (and also assume that you can interpret the C Schema as a
Schema
), but that isn't required by the spec. To be more generic, you can use the data type of the C Schema directly.Describe the solution you'd like
Some way to transfer a stream of
Array
via FFI.Describe alternatives you've considered
There's currently no way to exchange a stream of generic arrays with arrow-rs, as far as I can tell.
Additional context
For full disclosure, I've already implemented this in my own library, pyo3-arrow. I have an
ArrayReader
trait to parallelarrow::RecordBatchReader
, and vendored a derived copy offfi_stream.rs
to make it possible to handle this interop (while not necessarily materializing the entire stream as aChunkedArray
.I'm currently fine with my vendored copy of FFI, but others may have the same issue.
Previous discussion in #5295 (comment)
The text was updated successfully, but these errors were encountered: