Open
Description
Is your feature request related to a problem or challenge?
Currently, RepartitionExec
is implemented with a custom MPSC, based on the parking_lot. However, this implementation has poor performance and may become a bottleneck in some queries, when the number of input/out partitions is large.
Describe the solution you'd like
We could use a lock-free MPSC, like flume
, to improve the performance.
Describe alternatives you've considered
No response
Additional context
I have implemented my idea, and the benchmark on tpch shows it could accelerate the query:
Comparing main and feature_flume
Benchmark tpch.json
Query | main | feature_flume | Change |
---|---|---|---|
QQuery 1 | 317.52ms | 317.86ms | no change |
QQuery 2 | 73.18ms | 70.41ms | no change |
QQuery 3 | 136.38ms | 113.01ms | +1.21x faster |
QQuery 4 | 84.27ms | 51.30ms | +1.64x faster |
QQuery 5 | 170.56ms | 123.28ms | +1.38x faster |
QQuery 6 | 83.52ms | 81.93ms | no change |
QQuery 7 | 249.60ms | 220.84ms | +1.13x faster |
QQuery 8 | 191.66ms | 175.73ms | +1.09x faster |
QQuery 9 | 282.38ms | 213.37ms | +1.32x faster |
QQuery 10 | 230.92ms | 153.20ms | +1.51x faster |
QQuery 11 | 52.68ms | 54.10ms | no change |
QQuery 12 | 153.50ms | 119.72ms | +1.28x faster |
QQuery 13 | 314.86ms | 313.01ms | no change |
QQuery 14 | 115.02ms | 115.82ms | no change |
QQuery 15 | 90.32ms | 89.26ms | no change |
QQuery 16 | 67.44ms | 61.57ms | +1.10x faster |
QQuery 17 | 785.40ms | 786.18ms | no change |
QQuery 18 | 636.27ms | 491.24ms | +1.30x faster |
QQuery 19 | 232.26ms | 231.82ms | no change |
QQuery 20 | 261.95ms | 240.57ms | +1.09x faster |
QQuery 21 | 351.81ms | 239.96ms | +1.47x faster |
QQuery 22 | 54.88ms | 49.39ms | +1.11x faster |