@@ -738,18 +738,17 @@ unnecessarily, and avoid objects with large numbers of attributes.
738
738
739
739
.. _dataframeformat :
740
740
741
- Fetching using the DataFrame Interchange Protocol
742
- -------------------------------------------------
743
-
744
- Python-oracledb can fetch directly to the `Python DataFrame Interchange
745
- Protocol <https://data-apis.org/dataframe-protocol/latest/index.html> `__
746
- format. This can reduce application memory requirements and allow zero-copy
747
- data interchanges between Python data frame libraries. It is an efficient way
748
- to work with data using Python libraries such as `Apache Arrow
749
- <https://arrow.apache.org/> `__, `Pandas <https://pandas.pydata.org >`__, `Polars
750
- <https://pola.rs/> `__, `NumPy <https://numpy.org/ >`__, `PyTorch
751
- <https://pytorch.org/> `__, or to write files in `Apache Parquet
752
- <https://parquet.apache.org/> `__ format.
741
+ Fetching Data Frames
742
+ --------------------
743
+
744
+ Python-oracledb can fetch directly to data frames that expose an Apache Arrow
745
+ PyCapsule Interface. This can reduce application memory requirements and allow
746
+ zero-copy data interchanges between Python data frame libraries. It is an
747
+ efficient way to work with data using Python libraries such as `Apache PyArrow
748
+ <https://arrow.apache.org/docs/python/index.html> `__, `Pandas
749
+ <https://pandas.pydata.org> `__, `Polars <https://pola.rs/ >`__, `NumPy
750
+ <https://numpy.org/> `__, `PyTorch <https://pytorch.org/ >`__, or to write files
751
+ in `Apache Parquet <https://parquet.apache.org/ >`__ format.
753
752
754
753
.. note ::
755
754
@@ -759,9 +758,7 @@ to work with data using Python libraries such as `Apache Arrow
759
758
The method :meth: `Connection.fetch_df_all() ` fetches all rows from a query.
760
759
The method :meth: `Connection.fetch_df_batches() ` implements an iterator for
761
760
fetching batches of rows. The methods return :ref: `OracleDataFrame
762
- <oracledataframeobj>` objects, whose :ref: `methods <oracledataframemeth >`
763
- implement the Python DataFrame Interchange Protocol `DataFrame API Interface
764
- <https://data-apis.org/dataframe-protocol/latest/API.html> `__.
761
+ <oracledataframeobj>` objects.
765
762
766
763
For example, to fetch all rows from a query and print some information about
767
764
the results:
@@ -782,13 +779,36 @@ With Oracle Database's standard DEPARTMENTS table, this would display::
782
779
4 columns
783
780
27 rows
784
781
785
- To do more extensive operations on an :ref: `OracleDataFrame
786
- <oracledataframeobj>`, it can be converted to an appropriate library class, and
787
- then methods of that library can be used. For example it could be converted to
788
- a `Pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.
789
- DataFrame.html#pandas.DataFrame> `__, or to a `PyArrow table
790
- <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html> `__ as shown
791
- later.
782
+ **Summary of Converting OracleDataFrame to Other Data Frames **
783
+
784
+ To do more extensive operations, :ref: `OracleDataFrames <oracledataframeobj >`
785
+ can be converted to your chosen library data frame, and then methods of that
786
+ library can be used. This section has an overview of how best to do
787
+ conversions. Some examples are shown in subsequent sections.
788
+
789
+ To convert :ref: `OracleDataFrame <oracledataframeobj >` to a `PyArrow Table
790
+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html> `__, use
791
+ `pyarrow.Table.from_arrays()
792
+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_arrays> `__
793
+ which leverages the Arrow PyCapsule interface.
794
+
795
+ To convert :ref: `OracleDataFrame <oracledataframeobj >` to a `Pandas DataFrame
796
+ <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame> `__,
797
+ use `pyarrow.Table.to_pandas()
798
+ <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas> `__.
799
+
800
+ If you want to use a data frame library other than Pandas or PyArrow, use the
801
+ library's ``from_arrow() `` method to convert a PyArrow Table to the applicable
802
+ data frame, if your library supports this. For example, with `Polars
803
+ <https://pola.rs/> `__ use `polars.from_arrow()
804
+ <https://docs.pola.rs/api/python/dev/reference/api/polars.from_arrow.html> `__.
805
+
806
+ Lastly, if your data frame library does not support ``from_arrow() ``, then use
807
+ ``from_dataframe() `` if the library supports it. This can be slower, depending
808
+ on the implementation.
809
+
810
+ The general recommendation is to use Apache Arrow as much as possible but if
811
+ there are no options, then use ``from_dataframe() ``.
792
812
793
813
**Data Frame Type Mapping **
794
814
@@ -797,8 +817,8 @@ support makes use of `Apache nanoarrow <https://arrow.apache.org/nanoarrow/>`__
797
817
libraries to build data frames.
798
818
799
819
The following data type mapping occurs from Oracle Database types to the Arrow
800
- types used in OracleDataFrame objects. Querying any other types from Oracle
801
- Database will result in an exception.
820
+ types used in OracleDataFrame objects. Querying any other data types from
821
+ Oracle Database will result in an exception.
802
822
803
823
.. list-table-with-summary ::
804
824
:header-rows: 1
@@ -830,7 +850,6 @@ Database will result in an exception.
830
850
* - DB_TYPE_TIMESTAMP_TZ
831
851
- TIMESTAMP
832
852
833
-
834
853
When converting Oracle Database NUMBERs, if :attr: `defaults.fetch_decimals ` is
835
854
*True *, the Arrow data type is DECIMAL128. Note Arrow's DECIMAL128 format only
836
855
supports precision of up to 38 decimal digits. Else, if the Oracle number data
@@ -895,6 +914,11 @@ An example that creates and uses a `PyArrow Table
895
914
This makes use of :meth: `OracleDataFrame.column_arrays() ` which returns a list
896
915
of :ref: `OracleArrowArray Objects <oraclearrowarrayobj >`.
897
916
917
+ Internally `pyarrow.Table.from_arrays() <https://arrow.apache.org/docs/python/
918
+ generated/pyarrow.Table.html#pyarrow.Table.from_arrays> `__ leverages the Apache
919
+ Arrow PyCapsule interface that :ref: `OracleDataFrame <oracledataframeobj >`
920
+ exposes.
921
+
898
922
See `samples/dataframe_pyarrow.py <https://github.com/oracle/python-oracledb/
899
923
blob/main/samples/dataframe_pyarrow.py> `__ for a runnable example.
900
924
@@ -924,17 +948,19 @@ org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`__ is:
924
948
print (df.T) # transform
925
949
print (df.tail(3 )) # last three rows
926
950
927
- Using python-oracledb to fetch the interchange format will be more efficient
928
- than using the Pandas ``read_sql() `` method.
951
+ The `to_pandas() <https://arrow.apache.org/docs/python/generated/pyarrow.Table.
952
+ html#pyarrow.Table.to_pandas> `__ method supports arguments like
953
+ ``types_mapper=pandas.ArrowDtype `` and ``deduplicate_objects=False ``, which may
954
+ be useful for some data sets.
929
955
930
956
See `samples/dataframe_pandas.py <https://github.com/oracle/python-oracledb/
931
957
blob/main/samples/dataframe_pandas.py> `__ for a runnable example.
932
958
933
- Creating Polars Series
934
- ++++++++++++++++++++++
959
+ Creating Polars DataFrames
960
+ ++++++++++++++++++++++++++
935
961
936
- An example that creates and uses a `Polars Series
937
- <https://docs.pola.rs/api/python/stable/reference/series /index.html> `__ is:
962
+ An example that creates and uses a `Polars DataFrame
963
+ <https://docs.pola.rs/api/python/stable/reference/dataframe /index.html> `__ is:
938
964
939
965
.. code-block :: python
940
966
@@ -946,13 +972,16 @@ An example that creates and uses a `Polars Series
946
972
sql = " select id from SampleQueryTab order by id"
947
973
odf = connection.fetch_df_all(statement = sql, arraysize = 100 )
948
974
949
- # Convert to a Polars Series
950
- pyarrow_array = pyarrow.array(odf.get_column_by_name(" ID" ))
951
- p = polars.from_arrow(pyarrow_array)
975
+ # Convert to a Polars DataFrame
976
+ pyarrow_table = pyarrow.Table.from_arrays(
977
+ odf.column_arrays(), names = odf.column_names()
978
+ )
979
+ df = polars.from_arrow(pyarrow_table)
952
980
953
- # Perform various Polars operations on the Series
981
+ # Perform various Polars operations on the DataFrame
982
+ r, c = df.shape
983
+ print (f " { r} rows, { c} columns " )
954
984
print (p.sum())
955
- print (p.log10())
956
985
957
986
See `samples/dataframe_polars.py <https://github.com/oracle/python-oracledb/
958
987
blob/main/samples/dataframe_polars.py> `__ for a runnable example.
0 commit comments