[enchement](mc)Optimize reading of maxcompute partition tables.#45148
[enchement](mc)Optimize reading of maxcompute partition tables.#45148morningman merged 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
| } | ||
|
|
||
| public void setException(UserException e) { | ||
| public synchronized void setException(UserException e) { |
There was a problem hiding this comment.
Why using synchronized ? add comment in code
There was a problem hiding this comment.
It seems that there is no need to add synchronized.
| CompletableFuture.runAsync(() -> { | ||
| try { | ||
| TableBatchReadSession tableBatchReadSession = | ||
| createTableBatchReadSession(requiredBatchPartitionSpecs); |
There was a problem hiding this comment.
Looks like in this new implementation, you create read session for each partition?
Is create read session a heavy operation?
There was a problem hiding this comment.
create read session for a batch of partition . createTableBatchReadSession have network io , and increasing the number of partitions will cause network io to be very slow. I tested 1500 partitions and it took about 13 seconds.
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
…he#45148) Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
…) (#45246) bp #45148 ### What problem does this PR solve? Problem Summary: Optimize reading of maxcompute partition tables: 1. Introduce batch mode to generate splits for Maxcompute partition tables to optimize scenarios with a large number of partitions. Control it through the variable `num_partitions_in_batch_mode`. 2. Introduce catalog parameter `mc.split_cross_partition`. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug. 3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve the efficiency of mc arrow data conversion.
What problem does this PR solve?
Problem Summary:
Optimize reading of maxcompute partition tables:
num_partitions_in_batch_mode.mc.split_cross_partition. The parameter is true, which is more friendly to reading partition tables, and false, which is more friendly to debug.-Darrow.enable_null_check_for_get=falseto be jvm to improve the efficiency of mc arrow data conversion.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)