Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] support per bucket optimize for colocate aggregate #29252

Merged
merged 3 commits into from
Aug 29, 2023

Conversation

stdpain
Copy link
Contributor

@stdpain stdpain commented Aug 15, 2023

DOP baseline Peak mem bucket optimize Peak mem ratio:
16 2.42 sec 9.308 GB 1.93 sec 2.368 GB  
8 4.33 sec 9.278 GB 3.53 sec 1.181 GB  
4 8.25 sec 9.266 GB 6.28 sec 606.039 MB  
1 33.02 sec 9.257 GB 23.86 sec 154.247 MB  

compare to sorted streaming agg.

DOP Sort AGG Bucket compute ratio:
1 9.12 sec 24.18 sec  

It looks like there is some room for optimization. I will do it in the future.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.1
    • 3.0
    • 2.5
    • 2.4

@stdpain stdpain force-pushed the support_per_bucket branch from e53b7b7 to aa8b7f6 Compare August 16, 2023 06:58
@stdpain stdpain enabled auto-merge (squash) August 16, 2023 08:49
@stdpain stdpain force-pushed the support_per_bucket branch 2 times, most recently from eff10ec to 9b81928 Compare August 16, 2023 13:29
bool token = _ctx->token;
if (!token && _ctx->token.compare_exchange_strong(token, true)) {
RETURN_IF_ERROR(_ctx->sink->set_finishing(state));
_ctx->current_bucket_sink_finished = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should set even if error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to set it. we keep _ctx->all_input_finishing = true; in set_finishing is enough

fzhedu
fzhedu previously approved these changes Aug 17, 2023
Copy link
Contributor

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are too many small tables, should we merge several one to comput?

@fzhedu fzhedu self-requested a review August 17, 2023 13:37
Copy link
Contributor

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are too many small tables, should we merge several one to comput?

@fzhedu fzhedu disabled auto-merge August 17, 2023 13:38
@fzhedu
Copy link
Contributor

fzhedu commented Aug 17, 2023

if there are too many small tables, should we merge several one to comput?

ignore it

@stdpain stdpain force-pushed the support_per_bucket branch 2 times, most recently from 0bd5aff to 4472b99 Compare August 17, 2023 13:49
@stdpain stdpain enabled auto-merge (squash) August 17, 2023 13:52
@stdpain stdpain force-pushed the support_per_bucket branch from 4472b99 to 0e4614d Compare August 18, 2023 12:34
@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage Check]

😍 pass : 89 / 95 (93.68%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/planner/OlapScanNode.java 9 11 81.82% [404, 405]
🔵 com/starrocks/sql/optimizer/rule/tree/PhysicalDistributionAggOptRule.java 47 51 92.16% [45, 92, 97, 102]
🔵 com/starrocks/sql/optimizer/Optimizer.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalOlapScanOperator.java 4 4 100.00% []
🔵 com/starrocks/qe/SessionVariable.java 2 2 100.00% []
🔵 com/starrocks/planner/AggregationNode.java 7 7 100.00% []
🔵 com/starrocks/sql/plan/PlanFragmentBuilder.java 3 3 100.00% []
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalHashAggregateOperator.java 4 4 100.00% []
🔵 com/starrocks/sql/optimizer/rule/tree/AddDecodeNodeForDictStringRule.java 2 2 100.00% []
🔵 com/starrocks/planner/PlanFragment.java 7 7 100.00% []
🔵 com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java 2 2 100.00% []
🔵 com/starrocks/planner/PlanNode.java 1 1 100.00% []

nonKeyGroupBys.remove(column);
}

if (nonKeyGroupBys.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if sort key is [a,b,c], it seems that group by prefix of [a,b,c] can use sortAgg, other cases can not use sortAgg, for examples group by b, c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't generate this plan (SCAN->GlobalAGG) for group by b, c.

be/src/exec/pipeline/scan/morsel.cpp Outdated Show resolved Hide resolved
be/src/exec/pipeline/scan/morsel.cpp Show resolved Hide resolved
satanson
satanson previously approved these changes Aug 22, 2023

// disable optimize depends on physical order
// eg: sortedStreamingAGG/ PerBucketCompute
public void disablePhysicalDistributionOptimize() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name isn't reasonable, PhysicalDistribution general means shuffle, broadcast, colocate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kangkaisen Any better name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhysicalProperty?

auto-merge was automatically disabled August 25, 2023 09:25

Merge queue setting changed

Signed-off-by: stdpain <drfeng08@gmail.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
@stdpain
Copy link
Contributor Author

stdpain commented Aug 28, 2023

pr-regression: 19

Signed-off-by: stdpain <drfeng08@gmail.com>
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell B 34 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

warning The version of Java (11.0.20) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@stdpain stdpain enabled auto-merge (squash) August 29, 2023 02:16
@wanpengfei-git
Copy link
Collaborator

[FE Incremental Coverage Report]

😍 pass : 85 / 95 (89.47%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/planner/AggregationNode.java 4 7 57.14% [172, 173, 174]
🔵 com/starrocks/planner/OlapScanNode.java 9 11 81.82% [405, 406]
🔵 com/starrocks/sql/optimizer/rule/tree/PhysicalDistributionAggOptRule.java 46 51 90.20% [45, 92, 97, 102, 106]
🔵 com/starrocks/sql/optimizer/Optimizer.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalOlapScanOperator.java 4 4 100.00% []
🔵 com/starrocks/qe/SessionVariable.java 2 2 100.00% []
🔵 com/starrocks/sql/plan/PlanFragmentBuilder.java 3 3 100.00% []
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalHashAggregateOperator.java 4 4 100.00% []
🔵 com/starrocks/sql/optimizer/rule/tree/AddDecodeNodeForDictStringRule.java 2 2 100.00% []
🔵 com/starrocks/planner/PlanFragment.java 7 7 100.00% []
🔵 com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java 2 2 100.00% []
🔵 com/starrocks/planner/PlanNode.java 1 1 100.00% []

@wanpengfei-git
Copy link
Collaborator

[BE Incremental Coverage Report]

😞 fail : 27 / 286 (09.44%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/exec/scan_node.cpp 0 5 00.00% [143, 144, 145, 147, 148]
🔵 src/exec/aggregate/aggregate_blocking_node.cpp 0 18 00.00% [213, 214, 215, 216, 225, 226, 227, 228, 229, 249, 250, 284, 290, 293, 294, 295, 298, 299]
🔵 src/exec/pipeline/scan/scan_operator.h 0 1 00.00% [82]
🔵 src/exec/pipeline/bucket_process_operator.h 0 20 00.00% [30, 32, 33, 39, 40, 44, 46, 47, 48, 52, 53, 61, 63, 64, 73, 74, 82, 83, 84, 86]
🔵 src/exec/spill/input_stream.h 0 1 00.00% [71]
🔵 src/exec/pipeline/scan/morsel.cpp 0 55 00.00% [79, 80, 81, 82, 83, 84, 87, 88, 89, 90, 91, 92, 94, 99, 100, 101, 102, 104, 147, 148, 150, 151, 154, 155, 158, 159, 160, 162, 163, 165, 166, 167, 168, 169, 170, 171, 174, 175, 178, 179, 180, 183, 184, 185, 188, 189, 192, 195, 196, 197, 198, 199, 200, 201, 203]
🔵 src/exec/pipeline/operator.h 0 3 00.00% [97, 271, 273]
🔵 src/exec/query_cache/ticket_checker.cpp 0 9 00.00% [48, 49, 50, 51, 52, 53, 54, 55, 56]
🔵 src/exec/query_cache/multilane_operator.h 0 1 00.00% [79]
🔵 src/exec/pipeline/bucket_process_operator.cpp 0 84 00.00% [10, 11, 12, 13, 16, 17, 18, 19, 22, 23, 26, 27, 28, 30, 33, 34, 37, 38, 39, 40, 41, 42, 45, 48, 49, 50, 51, 53, 54, 55, 57, 60, 61, 62, 65, 66, 68, 69, 71, 72, 73, 74, 76, 77, 80, 81, 82, 83, 86, 87, 88, 89, 90, 91, 92, 94, 96, 100, 103, 105, 108, 110, 111, 112, 113, 114, 115, 118, 119, 122, 123, 126, 128, 131, 133, 134, 135, 136, 137, 138, 141, 142, 145, 146]
🔵 src/exec/spill/spiller.cpp 0 5 00.00% [115, 116, 117, 118, 119]
🔵 src/exec/pipeline/aggregate/spillable_aggregate_blocking_sink_operator.cpp 0 5 00.00% [113, 115, 116, 117, 118]
🔵 src/exec/query_cache/cache_operator.h 0 1 00.00% [67]
🔵 src/exec/olap_scan_node.cpp 0 6 00.00% [67, 68, 400, 401, 402, 403]
🔵 src/exec/olap_scan_node.h 0 1 00.00% [98]
🔵 src/exec/aggregate/distinct_blocking_node.cpp 0 18 00.00% [163, 164, 165, 166, 174, 175, 176, 177, 178, 204, 205, 220, 227, 232, 233, 235, 236, 237]
🔵 src/exec/pipeline/pipeline.cpp 1 4 25.00% [118, 119, 120]
🔵 src/exec/pipeline/scan/morsel.h 7 26 26.92% [134, 224, 226, 227, 228, 231, 235, 237, 255, 302, 303, 306, 307, 309, 311, 312, 335, 336, 338]
🔵 src/exec/pipeline/pipeline_driver.cpp 3 5 60.00% [88, 93]
🔵 src/exec/pipeline/scan/chunk_source.cpp 3 4 75.00% [79]
🔵 src/exec/pipeline/scan/scan_operator.cpp 12 13 92.31% [248]
🔵 src/exec/scan_node.h 1 1 100.00% []

@kangkaisen kangkaisen disabled auto-merge August 29, 2023 04:45
@kangkaisen kangkaisen merged commit 37b2b0c into StarRocks:main Aug 29, 2023
Jay-ju pushed a commit to Jay-ju/starrocks that referenced this pull request Sep 7, 2023
…rRocks#29252)

* [Enhancement] support per bucket optimize for colocate aggregate

Signed-off-by: stdpain <drfeng08@gmail.com>

* Fix comments

Signed-off-by: stdpain <drfeng08@gmail.com>

* Rename

Signed-off-by: stdpain <drfeng08@gmail.com>

---------

Signed-off-by: stdpain <drfeng08@gmail.com>
@luohaha
Copy link
Contributor

luohaha commented Jan 3, 2024

https://github.com/Mergifyio backport branch-3.1-cs

Copy link
Contributor

mergify bot commented Jan 3, 2024

backport branch-3.1-cs

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jan 3, 2024
)

* [Enhancement] support per bucket optimize for colocate aggregate

Signed-off-by: stdpain <drfeng08@gmail.com>

* Fix comments

Signed-off-by: stdpain <drfeng08@gmail.com>

* Rename

Signed-off-by: stdpain <drfeng08@gmail.com>

---------

Signed-off-by: stdpain <drfeng08@gmail.com>
(cherry picked from commit 37b2b0c)

# Conflicts:
#	be/src/exec/pipeline/bucket_process_operator.h
#	be/src/exec/pipeline/operator.h
#	fe/fe-core/src/main/java/com/starrocks/planner/PlanFragment.java
#	fe/fe-core/src/main/java/com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java
#	fe/fe-core/src/main/java/com/starrocks/sql/plan/PlanFragmentBuilder.java
#	gensrc/thrift/PlanNodes.thrift
stdpain added a commit to stdpain/starrocks-2 that referenced this pull request Jan 5, 2024
…rRocks#29252)

* [Enhancement] support per bucket optimize for colocate aggregate

Signed-off-by: stdpain <drfeng08@gmail.com>

* Fix comments

Signed-off-by: stdpain <drfeng08@gmail.com>

* Rename

Signed-off-by: stdpain <drfeng08@gmail.com>

---------

Signed-off-by: stdpain <drfeng08@gmail.com>
stdpain added a commit to stdpain/starrocks-2 that referenced this pull request Jan 5, 2024
…rRocks#29252)

* [Enhancement] support per bucket optimize for colocate aggregate

Signed-off-by: stdpain <drfeng08@gmail.com>

* Fix comments

Signed-off-by: stdpain <drfeng08@gmail.com>

* Rename

Signed-off-by: stdpain <drfeng08@gmail.com>

---------

Signed-off-by: stdpain <drfeng08@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants