-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I ran TPC-H query 12 several times with DataFusion and got different results each time:
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623097 | 934685 |
| SHIP | 622962 | 934510 |
+------------+-----------------+----------------+
Query 12 iteration 0 took 42932.5 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623099 | 934685 |
| SHIP | 622961 | 934513 |
+------------+-----------------+----------------+
Query 12 iteration 0 took 41807.4 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623099 | 934691 |
| SHIP | 622961 | 934514 |
+------------+-----------------+----------------+
Query 12 iteration 1 took 41652.9 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623097 | 934686 |
| SHIP | 622962 | 934510 |
+------------+-----------------+----------------+
Query 12 iteration 2 took 41660.9 ms
I see the same behavior with Ballista.
Query 12 iteration 0 took 32341.6 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623094 | 934692 |
| SHIP | 622961 | 934512 |
+------------+-----------------+----------------+
Query 12 iteration 1 took 29347.7 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL | 623093 | 934697 |
| SHIP | 622963 | 934515 |
+------------+-----------------+----------------+
To Reproduce
../target/release/tpch benchmark datafusion --path /mnt/tpch/parquet-sf100-partitioned/ --format parquet --iterations 3 --query 12 --debugExpected behavior
Results should be the same on each run.
Additional context
Physical plan:
SortExec: [l_shipmode@0 ASC]
MergeExec
ProjectionExec: expr=[l_shipmode@0 as l_shipmode, SUM(CASE WHEN #orders.o_orderpriority Eq Utf8("1-URGENT") Or #orders.o_orderpriority Eq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END)@1 as high_line_count, SUM(CASE WHEN #orders.o_orderpriority NotEq Utf8("1-URGENT") And #orders.o_orderpriority NotEq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END)@2 as low_line_count]
HashAggregateExec: mode=FinalPartitioned, gby=[l_shipmode@0 as l_shipmode], aggr=[SUM(CASE WHEN #orders.o_orderpriority Eq Utf8("1-URGENT") Or #orders.o_orderpriority Eq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END), SUM(CASE WHEN #orders.o_orderpriority NotEq Utf8("1-URGENT") And #orders.o_orderpriority NotEq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END)]
CoalesceBatchesExec: target_batch_size=4096
RepartitionExec: partitioning=Hash([Column { name: "l_shipmode", index: 0 }], 2)
HashAggregateExec: mode=Partial, gby=[l_shipmode@4 as l_shipmode], aggr=[SUM(CASE WHEN #orders.o_orderpriority Eq Utf8("1-URGENT") Or #orders.o_orderpriority Eq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END), SUM(CASE WHEN #orders.o_orderpriority NotEq Utf8("1-URGENT") And #orders.o_orderpriority NotEq Utf8("2-HIGH") THEN Int64(1) ELSE Int64(0) END)]
CoalesceBatchesExec: target_batch_size=4096
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: "l_orderkey", index: 0 }, Column { name: "o_orderkey", index: 0 })]
CoalesceBatchesExec: target_batch_size=4096
RepartitionExec: partitioning=Hash([Column { name: "l_orderkey", index: 0 }], 2)
CoalesceBatchesExec: target_batch_size=4096
FilterExec: l_shipmode@4 IN ([Literal { value: Utf8("MAIL") }, Literal { value: Utf8("SHIP") }]) AND l_commitdate@2 < l_receiptdate@3 AND l_shipdate@1 < l_commitdate@2 AND l_receiptdate@3 >= 8766 AND l_receiptdate@3 < 9131
ParquetExec: batch_size=8192, limit=None, partitions=[/mnt/tpch/parquet-sf100-partitioned//lineitem/part-00004-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00002-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00003-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00005-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00000-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00006-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00001-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//lineitem/part-00007-bfa6ef87-037f-477e-b18b-d9344b7d72a1-c000.snappy.parquet]
CoalesceBatchesExec: target_batch_size=4096
RepartitionExec: partitioning=Hash([Column { name: "o_orderkey", index: 0 }], 2)
ParquetExec: batch_size=8192, limit=None, partitions=[/mnt/tpch/parquet-sf100-partitioned//orders/part-00001-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00006-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00000-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00005-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00004-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00003-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00007-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet, /mnt/tpch/parquet-sf100-partitioned//orders/part-00002-9530d415-a7b1-4dd0-8f7b-7be0e6f0f37a-c000.snappy.parquet]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working