[enchement](mc)Optimize reading of maxcompute partition tables. by hubgeter · Pull Request #45148 · apache/doris

hubgeter · 2024-12-07T03:59:16Z

What problem does this PR solve?

Problem Summary:
Optimize reading of maxcompute partition tables:

Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable num_partitions_in_batch_mode.
Introduce catalog parameter mc.split_cross_partition. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug.
Add -Darrow.enable_null_check_for_get=false to be jvm to improve the efficiency of mc arrow data conversion.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

doris-robot · 2024-12-07T03:59:22Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

hubgeter · 2024-12-07T03:59:27Z

run buildall

morningman · 2024-12-07T07:37:54Z

fe/fe-core/src/main/java/org/apache/doris/datasource/SplitAssignment.java

    }

-    public void setException(UserException e) {
+    public synchronized void setException(UserException e) {


Why using synchronized ? add comment in code

It seems that there is no need to add synchronized.

morningman · 2024-12-07T07:46:25Z

fe/fe-core/src/main/java/org/apache/doris/datasource/maxcompute/source/MaxComputeScanNode.java

+                    CompletableFuture.runAsync(() -> {
+                        try {
+                            TableBatchReadSession tableBatchReadSession =
+                                    createTableBatchReadSession(requiredBatchPartitionSpecs);


Looks like in this new implementation, you create read session for each partition?
Is create read session a heavy operation?

create read session for a batch of partition . createTableBatchReadSession have network io , and increasing the number of partitions will cause network io to be very slow. I tested 1500 partitions and it took about 13 seconds.

hubgeter · 2024-12-08T03:53:06Z

run buildall

github-actions · 2024-12-08T08:18:06Z

PR approved by at least one committer and no changes requested.

github-actions · 2024-12-08T08:18:08Z

PR approved by anyone and no changes requested.

### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.

…ables. #45148 (#45168) Cherry-picked from #45148 Co-authored-by: daidai <changyuwei@selectdb.com>

…he#45148) Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.

…) (#45246) bp #45148 ### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.

[enchement](mc)Optimize reading of maxcompute partition tables.

530b7ec

morningman reviewed Dec 7, 2024

View reviewed changes

rm unused synchronized

b2b37d7

morningman added dev/2.1.x dev/3.0.x labels Dec 8, 2024

morningman approved these changes Dec 8, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 8, 2024

github-actions bot added the reviewed label Dec 8, 2024

zy-kkk approved these changes Dec 8, 2024

View reviewed changes

morningman merged commit 4cf908c into apache:master Dec 9, 2024

github-actions bot mentioned this pull request Dec 9, 2024

branch-3.0: [enchement](mc)Optimize reading of maxcompute partition tables. #45148 #45168

Merged

github-actions bot added the dev/2.1.x-conflict label Dec 9, 2024

morningman pushed a commit that referenced this pull request Dec 10, 2024

branch-3.0: [enchement](mc)Optimize reading of maxcompute partition t…

328c127

…ables. #45148 (#45168) Cherry-picked from #45148 Co-authored-by: daidai <changyuwei@selectdb.com>

morningman added dev/3.0.4-merged and removed dev/3.0.x labels Dec 10, 2024

hubgeter mentioned this pull request Dec 10, 2024

[enchement](mc)Optimize reading of maxcompute partition tables. (#45148) #45246

Merged

16 tasks

yiguolei added dev/2.1.8-merged and removed dev/2.1.x dev/2.1.x-conflict labels Dec 11, 2024

morningman mentioned this pull request Dec 31, 2024

[fix](split)Fixed the bug that batch mode split could not query data in multiple be scenarios. #46218

Merged

16 tasks

yiguolei mentioned this pull request Jan 19, 2025

Release Note 2.1.8 #47198

Closed

gavinchou mentioned this pull request Feb 18, 2025

Release Notes 3.0.4 #48013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enchement](mc)Optimize reading of maxcompute partition tables.#45148

[enchement](mc)Optimize reading of maxcompute partition tables.#45148
morningman merged 2 commits intoapache:masterfrom
hubgeter:mc_opt_partition

hubgeter commented Dec 7, 2024 •

edited

Loading

Uh oh!

doris-robot commented Dec 7, 2024

Uh oh!

hubgeter commented Dec 7, 2024

Uh oh!

morningman Dec 7, 2024

Uh oh!

hubgeter Dec 8, 2024

Uh oh!

morningman Dec 7, 2024

Uh oh!

hubgeter Dec 7, 2024

Uh oh!

hubgeter commented Dec 8, 2024

Uh oh!

github-actions bot commented Dec 8, 2024

Uh oh!

github-actions bot commented Dec 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hubgeter commented Dec 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

doris-robot commented Dec 7, 2024

Uh oh!

hubgeter commented Dec 7, 2024

Uh oh!

morningman Dec 7, 2024

Choose a reason for hiding this comment

Uh oh!

hubgeter Dec 8, 2024

Choose a reason for hiding this comment

Uh oh!

morningman Dec 7, 2024

Choose a reason for hiding this comment

Uh oh!

hubgeter Dec 7, 2024

Choose a reason for hiding this comment

Uh oh!

hubgeter commented Dec 8, 2024

Uh oh!

github-actions bot commented Dec 8, 2024

Uh oh!

github-actions bot commented Dec 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hubgeter commented Dec 7, 2024 •

edited

Loading