Skip to content

Optimize MPP Broadcast Join #7084

Closed
Closed
@solotzg

Description

@solotzg

MPP Broadcast Join Enhancement

Implementation

ref pingcap/tidb#40494

TiDB

  • support data compression in Broadcast / Passthrough exchange operator
  • add new session var: tidb_prefer_broadcast_join_by_exchange_data_size (release-7.1):
    • OFF: same behavior like <= v7.0
    • ON: compare data exchange size of join and choose the smallest one
      • tidb_broadcast_join_threshold_size and tidb_broadcast_join_threshold_count will be ignored if planner success to determine whether need broadcast.
  • refine the process about checking broadcast join
    • Broadcast exchange size:
      • Build: (mppStoreCnt - 1) * sizeof(BuildTable)
      • Probe: 0
      • exchange size: Build * (mppStoreCnt^?)
      • choose the child plan with the maximum approximate value as Probe
    • Shuffle exchange size:
      • Build: sizeof(BuildTable) * (mppStoreCnt - 1) / mppStoreCnt
      • Probe: sizeof(ProbeTable) * (mppStoreCnt - 1) / mppStoreCnt
      • exchange size: (Build + Probe) * (mppStoreCnt^?)
    • The size of Build hash table when using shuffle join is about sizeof(broadcast join hash table) / (mppStoreCnt). It will cost more time to construct Build hash table and search Probe while using broadcast join. Set a scale factor (mppStoreCnt^?) when estimating broadcast join in isJoinFitMPPBCJ and isJoinChildFitMPPBCJ (based on TPCH benchmark).
    • If the approximate exchange size of broadcast is smaller than hash, then we need to choose broadcast way.

TiFlash

Support data compression in Broadcast / Passthrough exchange operator

  • support data compression in broadcast/passthrough exchange
  • update parts of utils modules
  • For StorageDisaggregated, use latest mpp version for task meta.
    • TODO: enable data compression if necessary

image

Progress

Benchmark

ENV

Original Threshold x 10: only update threshold about broadcast join

  • set tidb_broadcast_join_threshold_count=102400
  • set tidb_broadcast_join_threshold_size=1048576000
  • set tidb_prefer_broadcast_join_by_exchange_data_size=off

Optimize BCJ

  • set tidb_prefer_broadcast_join_by_exchange_data_size=on

Time(s) Original Optimize BCJ Original Threshold x 10 Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0 Performance(QPS) Improvement: (Original) / ( Original Threshold x 10 ) - 1.0
Q1 2.79 2.79 2.92 0.00% -4.45%
Q2 1.11 0.97 1.11 14.43% 0.00%
Q3 3.05 2.52 3.05 21.03% 0.00%
Q4 2.38 2.45 2.38 -2.86% 0.00%
Q5 6.48 4.33 6.14 49.65% 5.54%
Q6 0.7 0.64 0.7 9.38% 0.00%
Q7 2.99 2.72 2.32 9.93% 28.88%
Q8 4.87 2.18 5.07 123.39% -3.94%
Q9 17.01 13.79 12.72 23.35% 33.73%
Q10 3.39 3.59 3.39 -5.57% 0.00%
Q11 0.84 0.5 0.44 68.00% 90.91%
Q12 1.51 1.31 1.31 15.27% 15.27%
Q13 3.12 3.19 3.19 -2.19% -2.19%
Q14 0.70 0.77 0.84 -9.09% -16.67%
Q15 1.51 1.58 1.58 -4.43% -4.43%
Q16 0.84 0.84 0.84 0.00% 0.00%
Q17 6.41 6.48 6.48 -1.08% -1.08%
Q18 6.95 5.87 6.74 18.40% 3.12%
Q19 1.71 1.85 1.78 -7.57% -3.93%
Q20 1.38 1.44 1.51 -4.17% -8.61%
Q21 8.36 7.21 7.08 15.95% 18.08%
Q22 0.7 0.64 0.64 9.38% 9.38%
SUM 78.8 67.66 72.23 16.46% 9.10%
  Original Optimize BCJ Original Threshold x 10 NET/IO Reduction:(Original-(Optimize BCJ))/Original NET/IO Reduction:(Original-( Original Threshold x 10))/Original
Total Exchange Size By NET (GB) 75.894 22.823 52.376 69.93% 30.99%

After pingcap/tidb#42915

Time Cost(s) Original Optimize BCJ Performance(QPS) Improvement
Q1 2.79 2.85 -2.11%
Q2 1.11 0.97 14.43%
Q3 3.05 3.05 0.00%
Q4 2.38 2.38 0.00%
Q5 6.48 4.66 39.06%
Q6 0.7 0.7 0.00%
Q7 2.99 2.18 37.16%
Q8 4.87 2.11 130.81%
Q9 17.01 10.57 60.93%
Q10 3.39 3.32 2.11%
Q11 0.84 0.44 90.91%
Q12 1.51 1.58 -4.43%
Q13 3.12 3.19 -2.19%
Q14 0.77 0.77 0.00%
Q15 1.51 1.58 -4.43%
Q16 0.84 0.84 0.00%
Q17 6.41 6.27 2.23%
Q18 6.95 6.74 3.12%
Q19 1.78 1.71 4.09%
Q20 1.38 1.38 0.00%
Q21 8.36 7.28 14.84%
Q22 0.7 0.64 9.38%
SUM 78.94 65.21 21.06%
  Original Optimize BCJ NET/IO Reduction:(Original-(Optimize BCJ))/Original
Total Exchange Size By NET (GB) 75.894 30.216 60.19%

Benchmark: One MPP Store

ENV

Time(s) Original Optimize BCJ Performance(QPS) Improvement: (Original) / (Optimize BCJ) - 1.0
Q1 8.15 8.15 0.00%
Q2 2.52 2.25 12.00%
Q3 5.94 4.4 35.00%
Q4 4.53 4.6 -1.52%
Q5 14.13 10.7 32.06%
Q6 1.98 1.91 3.66%
Q7 6.61 5.34 23.78%
Q8 9.56 5.47 74.77%
Q9 38.82 29.83 30.14%
Q10 8.36 7.15 16.92%
Q11 1.71 1.04 64.42%
Q12 4.19 3.46 21.10%
Q13 8.62 8.42 2.38%
Q14 2.05 2.11 -2.84%
Q15 4.13 4.13 0.00%
Q16 2.18 2.05 6.34%
Q17 13.72 13.79 -0.51%
Q18 17.28 15.07 14.66%
Q19 5.00 5.00 0.00%
Q20 3.05 3.05 0.00%
Q21 16.81 13.79 21.90%
Q22 1.31 1.24 5.65%
SUM 180.65 152.95 18.11%

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions