Skip to content

Error when Z Ordering a larger dataset #1459

Closed
@MrPowers

Description

@MrPowers

Environment

Delta-rs version: 0.10.0

Binding: Python

Environment:

  • Cloud provider: Localhost
  • OS: Macbook
  • Other: Macbook M1 with 64 GB of RAM

Bug

What happened: Z Order command worked on 5 GB h2o groupby dataset (1e8), but errors out of 50 GB dataset (1e9)

What you expected to happen: I expected the Z Ordering to work

How to reproduce it: This notebook shows the computations working well on the 1e8 dataset, but erroring out on the 1e9 dataset.

More details: I'm Z Ordering on a single column. Here's the error message:

thread 'tokio-runtime-worker' panicked at 'overflow', /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-select-39.0.0/src/interleave.rs:172:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
DeltaError                                Traceback (most recent call last)
File <timed eval>:1

File ~/opt/miniconda3/envs/deltalake-0100/lib/python3.9/site-packages/deltalake/table.py:697, in TableOptimizer.z_order(self, columns, partition_filters, target_size, max_concurrent_tasks)
    675 def z_order(
    676     self,
    677     columns: Iterable[str],
   (...)
    680     max_concurrent_tasks: Optional[int] = None,
    681 ) -> Dict[str, Any]:
    682     """
    683     Reorders the data using a Z-order curve to improve data skipping.
    684 
   (...)
    695     :return: the metrics from optimize
    696     """
--> 697     metrics = self.table._table.z_order_optimize(
    698         list(columns), partition_filters, target_size, max_concurrent_tasks
    699     )
    700     self.table.update_incremental()
    701     return json.loads(metrics)

DeltaError: Generic error: task 334 panicked

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions