Skip to content

Add support for fetching and ingesting Arrow Table/RecordBatch data #375

Closed
@mauropagano

Description

@mauropagano

It's hard to deny how Arrow has become the standard de-facto for in-memory representation of tabular data.
Multiple competing product (Snowflake, BigQuery, etc) as well as libraries (ADBC, TurboODBC) enable extracting data as Arrow, unlocking better performance, lower memory usage and increased consistency / interoperability between platforms.

In the python space, there are a number of libraries, especially around data processing, data science and ML (pandas, polars, duckdb, etc) that work on Arrow data natively, normally via zero-copy.

It would be extremely beneficial to have python-oracledb be able to:

  1. Extract data as Arrow, something like cursor.fetch_as_arrow() that can return either an Arrow Table or RecordBatch from the query. This method could bypass a python representation, speed up data extraction and ultimately keep Oracle closer to where some processing occurs.
  2. Opposite direction, enabling ingesting Arrow Table/RecordBatch into the database, something like cursor.executemany("', arrow_object) could skip python representation, use less memory and ultimately entice users to rely more on Oracle for that processing that works better in-db / storing of data produced elsewhere

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions