Closed
Description
Feature Request / Improvement
Hi team,
I recently encountered that that the table.upsert
results in some unexpected low level error(s), such as bus error, or illegal hardware instruction error. I tried to isolate what I have in the attached files.
How to recreate
- Run
first_run.py
- Run
second_run.py
with the commented out upsert:#table.upsert( # df=data, # join_cols=['block_number', 'transaction_index', 'log_index'], # when_matched_update_all=True, # when_not_matched_insert_all=True, # case_sensitive=True, #)
Note that the following works:
for rb in data.to_batches(max_chunksize=1_000):
batch_tbl = pa.Table.from_batches([rb])
table.upsert(
df=batch_tbl,
join_cols=['block_number', 'transaction_index', 'log_index'],
when_matched_update_all=True,
when_not_matched_insert_all=True,
case_sensitive=True,
)
Versions
Pyiceberg version: 0.9.1
Pyarrow: 20.0.0 (Also tried with 18.0.0, 17.0.0)
Hardware: Apple M2
Additional context
The same issue seems to have been mentioned here.
Thanks you in advance! 😊
Metadata
Metadata
Assignees
Labels
No labels