Closed
Description
It's hard to deny how Arrow has become the standard de-facto for in-memory representation of tabular data.
Multiple competing product (Snowflake, BigQuery, etc) as well as libraries (ADBC, TurboODBC) enable extracting data as Arrow, unlocking better performance, lower memory usage and increased consistency / interoperability between platforms.
In the python space, there are a number of libraries, especially around data processing, data science and ML (pandas, polars, duckdb, etc) that work on Arrow data natively, normally via zero-copy.
It would be extremely beneficial to have python-oracledb
be able to:
- Extract data as Arrow, something like
cursor.fetch_as_arrow()
that can return either an Arrow Table or RecordBatch from the query. This method could bypass a python representation, speed up data extraction and ultimately keep Oracle closer to where some processing occurs. - Opposite direction, enabling ingesting Arrow Table/RecordBatch into the database, something like
cursor.executemany("', arrow_object)
could skip python representation, use less memory and ultimately entice users to rely more on Oracle for that processing that works better in-db / storing of data produced elsewhere