[Feature][Zeta] Support Parallelism Inference #10102

xiaochen-zhou · 2025-11-23T05:29:42Z

Purpose of this pull request

Supports automatic parallelism inference for sources that implement the SupportParallelismInference interface (e.g., Paimon connector). When enabled, the engine will automatically determine the optimal parallelism based on data characteristics instead of using default parallelism value: 1

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Add new tests

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR.
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

zhangshenghang · 2025-11-24T13:33:47Z

docs/zh/seatunnel-engine/separated-cluster-deployment.md

+**服务器级别配置示例**
+
+在 `seatunnel.yaml` 中配置：
+
+```yaml
+seatunnel:
+  engine:
+    parallelism-inference:
+      enabled: true
+      max-parallelism: 100
+```
+
+**作业级别配置示例**
+
+在作业配置文件的 `env` 块中配置：
+
+```hocon
+env {
+  # 启用并行度推断
+  parallelism.inference.enabled = true
+  # 设置最大并行度
+  parallelism.inference.max-parallelism = 50
+}
+


Add some priority notes

zhangshenghang · 2025-11-24T13:43:28Z

...imon/src/main/java/org/apache/seatunnel/connectors/seatunnel/paimon/source/PaimonSource.java

+    @Override
+    public int inferParallelism() {
+        try {
+            for (Map.Entry<String, ReadBuilder> entry : readBuilders.entrySet()) {
+                String tableKey = entry.getKey();
+                ReadBuilder readBuilder = entry.getValue();
+                try {
+                    List<PartitionEntry> partitionEntries =
+                            readBuilder.newScan().listPartitionEntries();
+                    return !partitionEntries.isEmpty() ? partitionEntries.size() : 1;
+                } catch (Exception e) {
+                    log.warn(
+                            "Failed to get partition info for table {}, skipping parallelism inference",
+                            tableKey,
+                            e);
+                    return -1;
+                }
+            }
+
+        } catch (Exception e) {
+            log.warn("Failed to infer parallelism for Paimon source", e);
+            return -1;
+        }
+        return 1;
+    }


Only take the first table? Could you explain the design?

zhangshenghang · 2025-11-25T02:15:16Z

seatunnel-api/src/main/java/org/apache/seatunnel/api/options/EnvCommonOptions.java

+                            "Enable automatic parallelism inference based on data volume. "
+                                    + "When enabled, operators with parallelism=-1 will have their parallelism "
+                                    + "automatically determined based on the size of consumed data.");


The described logic has not been implemented

zhangshenghang · 2025-11-25T02:16:22Z

seatunnel-api/src/main/java/org/apache/seatunnel/api/options/EnvCommonOptions.java

+                            "The maximum parallelism for operators when using automatic inference. "
+                                    + "Must be a power of 2 to ensure even distribution of subpartitions.");


zhangshenghang · 2025-11-25T02:31:19Z

...-core/src/main/java/org/apache/seatunnel/engine/core/parse/MultipleTableJobConfigParser.java

+                        .orElse(parallelismInferenceConfig.isEnabled());
+
+        if (inferenceEnabled && source instanceof SupportParallelismInference) {
+            int inferredParallelism = ((SupportParallelismInference) source).inferParallelism();


What if we roll back the parallelism to the default if it has issues?

zhangshenghang · 2025-11-25T02:47:44Z

seatunnel-api/src/main/java/org/apache/seatunnel/api/options/EnvCommonOptions.java

+    public static Option<Integer> PARALLELISM_INFERENCE_MAX_PARALLELISM =
+            Options.key("parallelism.inference.max-parallelism")
+                    .intType()
+                    .defaultValue(128)


Why is it 128 here? The document describes it as 64.

zhangshenghang · 2025-11-25T02:49:21Z

seatunnel-api/src/main/java/org/apache/seatunnel/api/options/EnvCommonOptions.java

+
+    // ==================== Parallelism Inference Options ====================
+
+    public static Option<Boolean> PARALLELISM_INFERENCE_ENABLED =


New parameters need to be added in EnvOptionRule

xiaochen-zhou · 2025-11-25T13:38:04Z

I have updated the code based on the review suggestions. Please take a look when you have time. @zhangshenghang

zhangshenghang · 2025-11-27T13:09:47Z

...imon/src/main/java/org/apache/seatunnel/connectors/seatunnel/paimon/source/PaimonSource.java

+    public int inferParallelism() {
+        int inferParallelism = 0;
+        try {
+            for (Map.Entry<String, ReadBuilder> entry : readBuilders.entrySet()) {
+                String tableKey = entry.getKey();
+                ReadBuilder readBuilder = entry.getValue();
+                try {
+                    List<PartitionEntry> partitionEntries =
+                            readBuilder.newScan().listPartitionEntries();
+                    inferParallelism += partitionEntries.size();
+                } catch (Exception e) {
+                    log.warn(
+                            "Failed to get partition info for table {}, skipping parallelism inference",
+                            tableKey,
+                            e);
+                    return -1;
+                }
+            }
+
+        } catch (Exception e) {
+            log.warn("Failed to infer parallelism for Paimon source", e);
+            return -1;
+        }
+        return inferParallelism <= 0 ? 1 : inferParallelism;
+    }


Is there a problem with this design? Is there any recommended calculation rule for the parallelism of our Sink under normal circumstances? @Hisoka-X

zhangshenghang · 2025-11-27T14:59:55Z

...mmon/src/main/java/org/apache/seatunnel/engine/common/config/ParallelismInferenceConfig.java

+public class ParallelismInferenceConfig implements Serializable {
+    private static final long serialVersionUID = 1L;
+    private boolean enabled = false;
+    private int maxParallelism = 128;


Should it be consistent with the above here? It should be 64

Should it be consistent with the above here? It should be 64

Done.

zhangshenghang · 2025-11-27T15:01:17Z

.../org/apache/seatunnel/connectors/seatunnel/paimon/source/PaimonParallelismInferenceTest.java

+        // Expected: 2 (table1) + 3 (table2) + 1 (table3) = 7
+        Assertions.assertEquals(6, parallelism);


Please be consistent

xiaochen-zhou added 2 commits November 23, 2025 13:07

[Feature][Zeta] Support Parallelism Inference

3fe6769

[Feature][Zeta] Support Parallelism Inference

7265dec

github-actions bot added document Zeta connectors-v2 Zeta Rest API e2e api labels Nov 23, 2025

xiaochen-zhou added 2 commits November 23, 2025 13:39

add valid license header

598f3b9

fix test failures

4c745da

zhangshenghang reviewed Nov 24, 2025

View reviewed changes

zhangshenghang reviewed Nov 25, 2025

View reviewed changes

[Feature][Connector-V2] Support HDFS ViewFs Schema

26c369c

zhangshenghang reviewed Nov 27, 2025

View reviewed changes

be consistent

0c955b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][Zeta] Support Parallelism Inference #10102

[Feature][Zeta] Support Parallelism Inference #10102

xiaochen-zhou commented Nov 23, 2025 •

edited

Loading

Uh oh!

zhangshenghang Nov 24, 2025

Uh oh!

zhangshenghang Nov 24, 2025 •

edited

Loading

Uh oh!

zhangshenghang Nov 25, 2025

Uh oh!

zhangshenghang Nov 25, 2025

Uh oh!

zhangshenghang Nov 25, 2025

Uh oh!

zhangshenghang Nov 25, 2025

Uh oh!

zhangshenghang Nov 25, 2025

Uh oh!

xiaochen-zhou commented Nov 25, 2025

Uh oh!

zhangshenghang Nov 27, 2025

Uh oh!

zhangshenghang Nov 27, 2025

Uh oh!

xiaochen-zhou Nov 28, 2025

Uh oh!

zhangshenghang Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"The maximum parallelism for operators when using automatic inference. "
		+ "Must be a power of 2 to ensure even distribution of subpartitions.");


		// ==================== Parallelism Inference Options ====================

		public static Option<Boolean> PARALLELISM_INFERENCE_ENABLED =

		// Expected: 2 (table1) + 3 (table2) + 1 (table3) = 7
		Assertions.assertEquals(6, parallelism);

[Feature][Zeta] Support Parallelism Inference #10102

Are you sure you want to change the base?

[Feature][Zeta] Support Parallelism Inference #10102

Conversation

xiaochen-zhou commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangshenghang Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaochen-zhou commented Nov 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiaochen-zhou commented Nov 23, 2025 •

edited

Loading

zhangshenghang Nov 24, 2025 •

edited

Loading