-
Notifications
You must be signed in to change notification settings - Fork 207
Closed
Labels
Description
Describe the bug
we found orc scan has poor perfomance while running tpcds benchmark:
the same scan operator is times slower than parquet (from tpcds q3).
To Reproduce
Steps to reproduce the behavior:
- generate parquet and orc datasets using /tpcds/datagen.
- run bechmarks on both datasets using /tpcds/benchmark-runner.
- compare the performance of NativeParquetScan and NativeOrcScan.
Expected behavior
orc should have the similar performance comparing to parquet.
Edit
the main reason is that orc-rust reads all data without column pruning and predicate filtering, after applying column pruning with datafusion-contrib/datafusion-orc#133 , the performance will be much better:
currently orc is still 20%~30% slower than parquet, which maybe related to unsupported predicate filtering.
Reactions are currently unavailable
