Closed
Description
opened on Mar 16, 2023
MPP Broadcast Join Enhancement
Implementation
TiDB
- support data compression in
Broadcast
/Passthrough
exchange operator - add new session var:
tidb_prefer_broadcast_join_by_exchange_data_size
(release-7.1):- OFF: same behavior like <= v7.0
- ON: compare data exchange size of join and choose the smallest one
tidb_broadcast_join_threshold_size
andtidb_broadcast_join_threshold_count
will be ignored if planner success to determine whether need broadcast.
- refine the process about checking broadcast join
- Broadcast exchange size:
- Build: (mppStoreCnt - 1) * sizeof(BuildTable)
- Probe: 0
- exchange size: Build * (mppStoreCnt^?)
- choose the child plan with the maximum approximate value as Probe
- Shuffle exchange size:
- Build: sizeof(BuildTable) * (mppStoreCnt - 1) / mppStoreCnt
- Probe: sizeof(ProbeTable) * (mppStoreCnt - 1) / mppStoreCnt
- exchange size: (Build + Probe) * (mppStoreCnt^?)
- The size of
Build
hash table when using shuffle join is aboutsizeof(broadcast join hash table) / (mppStoreCnt)
. It will cost more time to constructBuild
hash table and searchProbe
while using broadcast join. Set a scale factor (mppStoreCnt^?
) when estimating broadcast join inisJoinFitMPPBCJ
andisJoinChildFitMPPBCJ
(based on TPCH benchmark). - If the approximate exchange size of broadcast is smaller than hash, then we need to choose broadcast way.
- Broadcast exchange size:
TiFlash
Support data compression in Broadcast
/ Passthrough
exchange operator
- support data compression in broadcast/passthrough exchange
- update parts of utils modules
- For
StorageDisaggregated
, use latest mpp version for task meta.- TODO: enable data compression if necessary
Progress
- Optimize data size of
Broadcast
/Passthrough
exchange operator #6880 - planner: support data compression in
Broadcast
/Passthrough
exchange operator; optimize process about choosing Broadcast Join; tidb#41968 - planner: disable
tidb_prefer_broadcast_join_by_exchange_data_size
by default; set scale factor to optimize estimating broadcast join; tidb#42915
Benchmark
ENV
- TPCH-100
- TiDB: 88174d2ef7f78a6dbd4e1f94b617b796f8f31c1d
- TiFlash x 3: 9ba4947f1b35ab9b08d794648be51d4a680ed9bf
- TIFLASH REPLICA x 3
- Original TiDB: 2cddfb3ec
Original Threshold x 10
: only update threshold about broadcast join
- set tidb_broadcast_join_threshold_count=102400
- set tidb_broadcast_join_threshold_size=1048576000
- set tidb_prefer_broadcast_join_by_exchange_data_size=off
Optimize BCJ
- set tidb_prefer_broadcast_join_by_exchange_data_size=on
Time(s) | Original | Optimize BCJ | Original Threshold x 10 | Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0 | Performance(QPS) Improvement: (Original) / ( Original Threshold x 10 ) - 1.0 |
---|---|---|---|---|---|
Q1 | 2.79 | 2.79 | 2.92 | 0.00% | -4.45% |
Q2 | 1.11 | 0.97 | 1.11 | 14.43% | 0.00% |
Q3 | 3.05 | 2.52 | 3.05 | 21.03% | 0.00% |
Q4 | 2.38 | 2.45 | 2.38 | -2.86% | 0.00% |
Q5 | 6.48 | 4.33 | 6.14 | 49.65% | 5.54% |
Q6 | 0.7 | 0.64 | 0.7 | 9.38% | 0.00% |
Q7 | 2.99 | 2.72 | 2.32 | 9.93% | 28.88% |
Q8 | 4.87 | 2.18 | 5.07 | 123.39% | -3.94% |
Q9 | 17.01 | 13.79 | 12.72 | 23.35% | 33.73% |
Q10 | 3.39 | 3.59 | 3.39 | -5.57% | 0.00% |
Q11 | 0.84 | 0.5 | 0.44 | 68.00% | 90.91% |
Q12 | 1.51 | 1.31 | 1.31 | 15.27% | 15.27% |
Q13 | 3.12 | 3.19 | 3.19 | -2.19% | -2.19% |
Q14 | 0.70 | 0.77 | 0.84 | -9.09% | -16.67% |
Q15 | 1.51 | 1.58 | 1.58 | -4.43% | -4.43% |
Q16 | 0.84 | 0.84 | 0.84 | 0.00% | 0.00% |
Q17 | 6.41 | 6.48 | 6.48 | -1.08% | -1.08% |
Q18 | 6.95 | 5.87 | 6.74 | 18.40% | 3.12% |
Q19 | 1.71 | 1.85 | 1.78 | -7.57% | -3.93% |
Q20 | 1.38 | 1.44 | 1.51 | -4.17% | -8.61% |
Q21 | 8.36 | 7.21 | 7.08 | 15.95% | 18.08% |
Q22 | 0.7 | 0.64 | 0.64 | 9.38% | 9.38% |
SUM | 78.8 | 67.66 | 72.23 | 16.46% | 9.10% |
Original | Optimize BCJ | Original Threshold x 10 | NET/IO Reduction:(Original-(Optimize BCJ))/Original | NET/IO Reduction:(Original-( Original Threshold x 10))/Original | |
---|---|---|---|---|---|
Total Exchange Size By NET (GB) | 75.894 | 22.823 | 52.376 | 69.93% | 30.99% |
After pingcap/tidb#42915
Time Cost(s) | Original | Optimize BCJ | Performance(QPS) Improvement |
---|---|---|---|
Q1 | 2.79 | 2.85 | -2.11% |
Q2 | 1.11 | 0.97 | 14.43% |
Q3 | 3.05 | 3.05 | 0.00% |
Q4 | 2.38 | 2.38 | 0.00% |
Q5 | 6.48 | 4.66 | 39.06% |
Q6 | 0.7 | 0.7 | 0.00% |
Q7 | 2.99 | 2.18 | 37.16% |
Q8 | 4.87 | 2.11 | 130.81% |
Q9 | 17.01 | 10.57 | 60.93% |
Q10 | 3.39 | 3.32 | 2.11% |
Q11 | 0.84 | 0.44 | 90.91% |
Q12 | 1.51 | 1.58 | -4.43% |
Q13 | 3.12 | 3.19 | -2.19% |
Q14 | 0.77 | 0.77 | 0.00% |
Q15 | 1.51 | 1.58 | -4.43% |
Q16 | 0.84 | 0.84 | 0.00% |
Q17 | 6.41 | 6.27 | 2.23% |
Q18 | 6.95 | 6.74 | 3.12% |
Q19 | 1.78 | 1.71 | 4.09% |
Q20 | 1.38 | 1.38 | 0.00% |
Q21 | 8.36 | 7.28 | 14.84% |
Q22 | 0.7 | 0.64 | 9.38% |
SUM | 78.94 | 65.21 | 21.06% |
Original | Optimize BCJ | NET/IO Reduction:(Original-(Optimize BCJ))/Original | |
---|---|---|---|
Total Exchange Size By NET (GB) | 75.894 | 30.216 | 60.19% |
Benchmark: One MPP Store
ENV
- TPCH-100
- TiFlash x 1: 9ba4947f1b35ab9b08d794648be51d4a680ed9bf
Time(s) | Original | Optimize BCJ | Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0 |
---|---|---|---|
Q1 | 8.15 | 8.15 | 0.00% |
Q2 | 2.52 | 2.25 | 12.00% |
Q3 | 5.94 | 4.4 | 35.00% |
Q4 | 4.53 | 4.6 | -1.52% |
Q5 | 14.13 | 10.7 | 32.06% |
Q6 | 1.98 | 1.91 | 3.66% |
Q7 | 6.61 | 5.34 | 23.78% |
Q8 | 9.56 | 5.47 | 74.77% |
Q9 | 38.82 | 29.83 | 30.14% |
Q10 | 8.36 | 7.15 | 16.92% |
Q11 | 1.71 | 1.04 | 64.42% |
Q12 | 4.19 | 3.46 | 21.10% |
Q13 | 8.62 | 8.42 | 2.38% |
Q14 | 2.05 | 2.11 | -2.84% |
Q15 | 4.13 | 4.13 | 0.00% |
Q16 | 2.18 | 2.05 | 6.34% |
Q17 | 13.72 | 13.79 | -0.51% |
Q18 | 17.28 | 15.07 | 14.66% |
Q19 | 5.00 | 5.00 | 0.00% |
Q20 | 3.05 | 3.05 | 0.00% |
Q21 | 16.81 | 13.79 | 21.90% |
Q22 | 1.31 | 1.24 | 5.65% |
SUM | 180.65 | 152.95 | 18.11% |
Activity