Open
Description
Is your feature request related to a problem or challenge?
Currently, read from CSV default to UTF8, when setting to UTF8, the performance improved a lot from my local mac, need to verify.
See the result:
./bench.sh compare main default_utf8_for_unkown_type
Comparing main and default_utf8_for_unkown_type
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_1.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_1.json does not exist
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/clickbench_partitioned.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/clickbench_partitioned.json does not exist
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/h2o_join.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/h2o_join.json does not exist
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch.json does not exist
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch1.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch1.json does not exist
Note: Skipping /Users/zhuqi/arrow-datafusion/benchmarks/results/main/sort_tpch10.json as /Users/zhuqi/arrow-datafusion/benchmarks/results/default_utf8_for_unkown_type/sort_tpch10.json does not exist
--------------------
Benchmark tpch_mem_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ default_utf8_for_unkown_type ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 328.67ms │ 321.92ms │ no change │
│ QQuery 2 │ 63.01ms │ 61.09ms │ no change │
│ QQuery 3 │ 115.07ms │ 115.89ms │ no change │
│ QQuery 4 │ 65.51ms │ 65.96ms │ no change │
│ QQuery 5 │ 226.31ms │ 228.79ms │ no change │
│ QQuery 6 │ 49.78ms │ 55.67ms │ 1.12x slower │
│ QQuery 7 │ 500.94ms │ 491.28ms │ no change │
│ QQuery 8 │ 169.84ms │ 170.33ms │ no change │
│ QQuery 9 │ 376.36ms │ 377.73ms │ no change │
│ QQuery 10 │ 173.76ms │ 176.28ms │ no change │
│ QQuery 11 │ 44.19ms │ 44.36ms │ no change │
│ QQuery 12 │ 177.45ms │ 176.37ms │ no change │
│ QQuery 13 │ 120.58ms │ 119.20ms │ no change │
│ QQuery 14 │ 23.83ms │ 22.58ms │ +1.06x faster │
│ QQuery 15 │ 56.57ms │ 55.66ms │ no change │
│ QQuery 16 │ 51.25ms │ 53.85ms │ 1.05x slower │
│ QQuery 17 │ 419.65ms │ 398.08ms │ +1.05x faster │
│ QQuery 18 │ 2142.91ms │ 1926.17ms │ +1.11x faster │
│ QQuery 19 │ 80.08ms │ 80.44ms │ no change │
│ QQuery 20 │ 110.32ms │ 108.41ms │ no change │
│ QQuery 21 │ 835.77ms │ 776.81ms │ +1.08x faster │
│ QQuery 22 │ 51.87ms │ 50.27ms │ no change │
└──────────────┴───────────┴──────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main) │ 6183.72ms │
│ Total Time (default_utf8_for_unkown_type) │ 5877.13ms │
│ Average Time (main) │ 281.08ms │
│ Average Time (default_utf8_for_unkown_type) │ 267.14ms │
│ Queries Faster │ 4 │
│ Queries Slower │ 2 │
│ Queries with No Change │ 16 │
└─────────────────────────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main ┃ default_utf8_for_unkown_type ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 53.89ms │ 55.31ms │ no change │
│ QQuery 2 │ 18.83ms │ 18.69ms │ no change │
│ QQuery 3 │ 27.53ms │ 28.45ms │ no change │
│ QQuery 4 │ 19.24ms │ 20.74ms │ 1.08x slower │
│ QQuery 5 │ 38.84ms │ 38.58ms │ no change │
│ QQuery 6 │ 18.38ms │ 17.62ms │ no change │
│ QQuery 7 │ 49.14ms │ 50.69ms │ no change │
│ QQuery 8 │ 38.30ms │ 39.04ms │ no change │
│ QQuery 9 │ 70.32ms │ 46.85ms │ +1.50x faster │
│ QQuery 10 │ 58.20ms │ 39.86ms │ +1.46x faster │
│ QQuery 11 │ 20.48ms │ 13.67ms │ +1.50x faster │
│ QQuery 12 │ 36.34ms │ 29.02ms │ +1.25x faster │
│ QQuery 13 │ 30.98ms │ 27.47ms │ +1.13x faster │
│ QQuery 14 │ 22.34ms │ 22.23ms │ no change │
│ QQuery 15 │ 33.72ms │ 33.16ms │ no change │
│ QQuery 16 │ 12.58ms │ 12.55ms │ no change │
│ QQuery 17 │ 57.71ms │ 56.33ms │ no change │
│ QQuery 18 │ 67.58ms │ 68.15ms │ no change │
│ QQuery 19 │ 33.12ms │ 36.06ms │ 1.09x slower │
│ QQuery 20 │ 27.81ms │ 28.32ms │ no change │
│ QQuery 21 │ 57.20ms │ 58.21ms │ no change │
│ QQuery 22 │ 12.38ms │ 12.75ms │ no change │
└──────────────┴─────────┴──────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main) │ 804.91ms │
│ Total Time (default_utf8_for_unkown_type) │ 753.77ms │
│ Average Time (main) │ 36.59ms │
│ Average Time (default_utf8_for_unkown_type) │ 34.26ms │
│ Queries Faster │ 5 │
│ Queries Slower │ 2 │
│ Queries with No Change │ 15 │
└─────────────────────────────────────────────┴──────────┘
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response