Skip to content

Commit 8d61480

Browse files
[FLINK-38073][docs] Complement with TaskManager/JobManager/Host specs
1 parent aef1a00 commit 8d61480

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

docs/content/docs/dev/table/tuning.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -315,8 +315,10 @@ In most joins, a significant portion of processing time is spent fetching record
315315
The main benefits of the MultiJoin operator are:
316316

317317
1) Considerably smaller state size due to zero intermediate state.
318-
3) Improved performance for chained joins with record amplification.
319-
4) Improved stability: linear state growth with amount of records processed, instead of polynomial growth with binary joins.
318+
2) Improved performance for chained joins with record amplification.
319+
3) Improved stability: linear state growth with amount of records processed, instead of polynomial growth with binary joins.
320+
321+
Also, pipelines with MultiJoin instead of binary joins usually have faster initialization and recovery times due to smaller state and fewer amount of nodes.
320322

321323
### When to enable the MultiJoin?
322324

@@ -349,7 +351,7 @@ For this 10-way join above, involving record amplification, we've observed signi
349351

350352
The total state is always smaller with the MultiJoin operator. In this case, the performance is initially the same, but as the intermediate state grows, the performance of binary joins degrade and the multi join remains stable and outperforms.
351353

352-
This general benchmark for the 10-way join was run with the following configuration: 10 upsert kafka topics, 10 parallelism, 1 record per second per topic. We used rocksdb with unaligned checkpoints and with incremental checkpoints. The sink uses a blackhole connector so we only benchmark the joins. This is the SQL used to generate the benchmark data:
354+
This general benchmark for the 10-way join was run with the following configuration: 10 upsert kafka topics, 10 parallelism, 1 record per second per topic. We used rocksdb with unaligned checkpoints and with incremental checkpoints. Each job ran in one TaskManager containing 8GB process memory, 1GB off-heap memory and 20% network memory. The JobManager had 4GB process memory. The host machine contained a M1 processor chip, 32GB RAM and 1TB SSD. The sink uses a blackhole connector so we only benchmark the joins. The SQL used to generate the benchmark data had this structure:
353355

354356
```sql
355357
INSERT INTO JoinResultsMJ

0 commit comments

Comments
 (0)