Closed
Description
Environment
Delta-rs version: 0.10.0
Binding: Python
Environment:
- Cloud provider: Localhost
- OS: Macbook
- Other: Macbook M1 with 64 GB of RAM
Bug
What happened: Z Order command worked on 5 GB h2o groupby dataset (1e8), but errors out of 50 GB dataset (1e9)
What you expected to happen: I expected the Z Ordering to work
How to reproduce it: This notebook shows the computations working well on the 1e8 dataset, but erroring out on the 1e9 dataset.
More details: I'm Z Ordering on a single column. Here's the error message:
thread 'tokio-runtime-worker' panicked at 'overflow', /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-select-39.0.0/src/interleave.rs:172:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
DeltaError Traceback (most recent call last)
File <timed eval>:1
File ~/opt/miniconda3/envs/deltalake-0100/lib/python3.9/site-packages/deltalake/table.py:697, in TableOptimizer.z_order(self, columns, partition_filters, target_size, max_concurrent_tasks)
675 def z_order(
676 self,
677 columns: Iterable[str],
(...)
680 max_concurrent_tasks: Optional[int] = None,
681 ) -> Dict[str, Any]:
682 """
683 Reorders the data using a Z-order curve to improve data skipping.
684
(...)
695 :return: the metrics from optimize
696 """
--> 697 metrics = self.table._table.z_order_optimize(
698 list(columns), partition_filters, target_size, max_concurrent_tasks
699 )
700 self.table.update_incremental()
701 return json.loads(metrics)
DeltaError: Generic error: task 334 panicked
Activity