-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Hi,
- What versions are you using?
con.version: 23.26.1.1.0
platform.platform: macOS-26.2-arm64-arm-64bit
sys.maxsize > 2**32: True
platform.python_version: 3.12.12
oracledb.__version__: 3.4.2- Is it an error or a hang or a crash?
Error
- What error(s) or behavior you are seeing?
import pandas as pd
import oracledb
with oracledb.connect(...) as con:
cur = con.cursor()
cur.execute('create private temporary table ora$ptt_mytest (a int) on commit drop definition')
df = pd.DataFrame({'A': [0, 1]}).convert_dtypes(dtype_backend='pyarrow')
df = df.iloc[1:2]
print('Inserted:\n', df)
cur.executemany('insert into ora$ptt_mytest (a) values (:a)', df)
print('Selected:\n', pd.DataFrame.from_arrow(con.fetch_df_all('select * from ora$ptt_mytest')))Both "inserted" and "selected" dataframes should be the same, but they are different
Inserted:
A
1 1
Selected:
A
0 0.0
We have isolated the problem to be only when the dataframe is some kind of view of another one.
Here we are creating a dataframe with 2 rows, and then calling df.iloc[1:2], which selects only the second row.
But instead, oracledb inserted the first row.
This does not happen if we (i believe)
- don't call
.convert_dtypes(), because then the Arrow PyCapsule interface creates new buffers - round trip the dataframe e.g. through a file, so that we are sure it owns its data
This leads me to believe that when I call df.iloc[1:2], the result will point to the same Arrow buffer with an offset, and this offset is ignored by oracledb (but length is not ignored, because only 1 row from the buffer is inserted).
We ran into this problem with manual batching, i.e.
big_df: pd.DataFrame
batches = [big_df.iloc[i:i+1000] for i in range(0, len(big_df), 1000)]
for b in batches:
cur.executemany('INSERT ...', b)which kept inserting the first batch again and again.
This may be a problem with Pandas not correctly exporting Arrow PyCapsule, but probably no, because polars.from_arrow(df) gives the correct result.
- Does your application call init_oracle_client()?
No, using thin mode.