[Feature] Group concat support order by and distinct #28778

fzhedu · 2023-08-07T14:58:34Z

support group_concat(distinct x1, x2 order by y1,y2, separator s)

the arguments are listed as : x1, x2, s, y1, y2, output x1, x2, s at last.
the distinct just works on x1, x2, and reject null on x1, x2.

mysql> select group_concat(name), group_concat(distinct name order by 1 separator '/') from ss group by id order by 1;
+------------------------------------------------+-------------------------------------------------------------+
| group_concat(name SEPARATOR ',')               | group_concat(DISTINCT name ORDER BY name ASC SEPARATOR '/') |
+------------------------------------------------+-------------------------------------------------------------+
| NULL                                           | NULL                                                        |
| May,Ti,欧阳诸葛方程                            | May/Ti/欧阳诸葛方程                                         |
| Ti                                             | Ti                                                          |
| Tom,Tom                                        | Tom                                                         |
| Tom,Tom,王武程咬金                             | Tom/王武程咬金                                              |
| 张三此地无银三百两,张三掩耳盗铃                | 张三掩耳盗铃/张三此地无银三百两                             |
| 李四大闹天空                                   | 李四大闹天空                                                |
+------------------------------------------------+-------------------------------------------------------------+
7 rows in set (0.08 sec)

What type of PR is this:

Checklist:

I have added test cases for my bug fix or my new feature
This pr will affect users' behaviors
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

fzhedu · 2023-08-15T03:04:52Z

test/sql/test_agg/R/test_distinct_agg

 -- result:
-5711937174881
+-5714598445053
 -- !result


group_concat 'result ‘4, 4’ -> '4,4' size is changed from 4 to 3.

LiShuMing · 2023-08-15T03:16:49Z

docs/sql-reference/sql-functions/string-functions/group_concat.md

-Returns a VARCHAR value.
+Returns a string value for each group, but returns NULL if there are no non-NULL values.
+
+set `group_concat_max_len` to limit the length of output string from a group, its default value is 1024, minimal value is 4.


Give an example to explain how to use this?

LiShuMing · 2023-08-15T03:24:14Z

be/src/exprs/agg/group_concat.h

+        DCHECK(state.output_col_num > 0);
+        for (auto i = 0; i < state.output_col_num; ++i) {
+            if (UNLIKELY(!is_string_type(ctx->get_arg_type(i)->type))) {
+                ctx->set_error(fmt::format("{}-th input of group_concat should be string type.", i + 1).c_str(), false);


What's the behavior of this? this should not check here?

safety check at create, if error, the agg will report error and stop.

LiShuMing · 2023-08-15T03:25:44Z

be/src/exprs/agg/group_concat.h

+// redundancy columns in intermediate results. For example, group_concat(a,b order by 1,2) is rewritten to
+// group_concat(cast(a to string), cast(b to string) order by a, b), resulting to keeping 4 columns, but it only needs
+// keep 2 columns in intermediate results.
+// 3. refactor order-by and distinct function to a combinator to clean the code.


skip to order by a if a is already sorted?
group_concat(a order by 1) c

it may be impossible in hash partition mode, as a is distributed on several node.

LiShuMing · 2023-08-15T03:26:17Z

be/src/exprs/agg/group_concat.h

+class GroupConcatAggregateFunctionV2
+        : public AggregateFunctionBatchHelper<GroupConcatAggregateStateV2, GroupConcatAggregateFunctionV2> {
+public:
+    // group_concat(a, b order by c, d), the arguments are a,b,',',c,d


why need extra , column ?

, is the separator, we support is as a varable. If not store as a column, we may need other new way.

LiShuMing · 2023-08-15T03:31:20Z

be/src/exprs/agg/group_concat.h

+        if (ctx->get_is_distinct()) {
+            for (auto row_id = 0; row_id < elem_size; row_id++) {
+                bool is_duplicated = false;
+                for (auto next_id = row_id + 1; next_id < elem_size; next_id++) {


Maybe use hashset to avoid repeat compare?

What if the distinct column has been sorted above?

the final resut usually is not large after the global distinct, so I let it as a TODO.

LiShuMing · 2023-08-15T03:33:10Z

be/src/exprs/agg/group_concat.h

+            state_impl.release_order_by_columns();
+            DCHECK(ctx->state()->cancelled_ref() || st.ok());
+            for (auto i = 0; i < output_col_num; ++i) {
+                materialize_column_by_permutation(outputs[i].get(), {(*state_impl.data_columns)[i]}, perm);


is that possible late materialize column in the final output?

it is determited by the repeated ratio for a chunk. if more repeated tuples, do distinct first is better, otherwise sort first is better.

fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java

fe/fe-core/src/main/java/com/starrocks/sql/analyzer/FunctionAnalyzer.java

fe/fe-core/src/main/java/com/starrocks/sql/parser/StarRocksLex.g4

Seaven · 2023-08-17T07:01:40Z

be/src/exprs/agg/factory/aggregate_factory.cpp

@@ -134,6 +134,12 @@ static const AggregateFunction* get_function(const std::string& name, LogicalTyp
        }
    }

+    if (func_version > 6) {
+        if (name == "group_concat") {
+            func_name = "group_concat2";


the performance will get worse when has none orderby?

The main difference lay at the intermediate results, previous way V1 just concat all strings per group, but the new way V2 store intermediate strings in struct{array[]}, with extra array's offsets costs, one offset per group. So the cost may be not large if group is not large, otherwise not.

wanpengfei-git · 2023-08-21T03:55:26Z

[FE PR Coverage Check]

😍 pass : 56 / 62 (90.32%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/sql/optimizer/rule/transformation/SplitAggregateRule.java	3	5	60.00%	[320, 321]
🔵	com/starrocks/sql/analyzer/FunctionAnalyzer.java	4	6	66.67%	[126, 135]
🔵	com/starrocks/sql/optimizer/rule/transformation/RewriteMultiDistinctByCTERule.java	3	4	75.00%	[289]
🔵	com/starrocks/sql/analyzer/ExpressionAnalyzer.java	20	21	95.24%	[1246]
🔵	com/starrocks/catalog/AggregateFunction.java	7	7	100.00%	[]
🔵	com/starrocks/qe/SessionVariable.java	1	1	100.00%	[]
🔵	com/starrocks/catalog/FunctionSet.java	2	2	100.00%	[]
🔵	com/starrocks/analysis/FunctionCallExpr.java	3	3	100.00%	[]
🔵	com/starrocks/analysis/FunctionParams.java	1	1	100.00%	[]
🔵	com/starrocks/sql/analyzer/AstToStringBuilder.java	11	11	100.00%	[]
🔵	com/starrocks/sql/optimizer/operator/AggType.java	1	1	100.00%	[]

fzhedu · 2023-08-22T06:31:26Z

admit test failed due to changing output formats, will fixed by https://github.com/StarRocks/StarRocksTest/pull/3738

satanson · 2023-08-23T02:32:21Z

be/src/exprs/agg/group_concat.h

+
+    void update_batch_single_state(FunctionContext* ctx, size_t chunk_size, const Column** columns,
+                                   AggDataPtr __restrict state) const override {
+        GroupConcatAggregateStateV2& state_impl = this->data(state);


use derived template type parameter instead concrete type

satanson · 2023-08-23T02:34:08Z

be/src/exprs/agg/group_concat.h

+    // group_concat(a, b order by c, d), the arguments are a,b,',',c,d
+    void create_impl(FunctionContext* ctx, GroupConcatAggregateStateV2& state) const {
+        auto num = ctx->get_num_args();
+        state.data_columns = new Columns;


use unique_ptr or shared_ptr instead raw pointer

satanson · 2023-08-23T02:45:02Z

be/src/exprs/agg/group_concat.h

+                // just copy the first const value.
+                data_col = down_cast<const ConstColumn*>(columns[i])->data_column().get();
+                tmp_row_num = 0;
+            }


Missing branch for processing NullableColumn

nullable column can be update in the state, as the data columns in state are nullable.

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

…t col Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

sonarqubecloud · 2023-08-23T12:45:44Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
24 Code Smells

0.0% Coverage
0.0% Duplication

The version of Java (11.0.20) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

wanpengfei-git · 2023-08-24T02:04:42Z

[FE Incremental Coverage Report]

😍 pass : 59 / 62 (95.16%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/sql/optimizer/rule/transformation/RewriteMultiDistinctByCTERule.java	3	4	75.00%	[289]
🔵	com/starrocks/sql/analyzer/FunctionAnalyzer.java	5	6	83.33%	[126]
🔵	com/starrocks/sql/analyzer/ExpressionAnalyzer.java	20	21	95.24%	[1246]
🔵	com/starrocks/catalog/AggregateFunction.java	7	7	100.00%	[]
🔵	com/starrocks/sql/optimizer/rule/transformation/SplitAggregateRule.java	5	5	100.00%	[]
🔵	com/starrocks/qe/SessionVariable.java	1	1	100.00%	[]
🔵	com/starrocks/catalog/FunctionSet.java	2	2	100.00%	[]
🔵	com/starrocks/analysis/FunctionCallExpr.java	3	3	100.00%	[]
🔵	com/starrocks/analysis/FunctionParams.java	1	1	100.00%	[]
🔵	com/starrocks/sql/analyzer/AstToStringBuilder.java	11	11	100.00%	[]
🔵	com/starrocks/sql/optimizer/operator/AggType.java	1	1	100.00%	[]

wanpengfei-git · 2023-08-24T02:05:20Z

[BE Incremental Coverage Report]

😞 fail : 189 / 275 (68.73%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	src/exprs/agg/nullable_aggregate.h	1	7	14.29%	[751, 779, 780, 781, 875, 876]
🔵	src/exprs/agg/factory/aggregate_factory.cpp	1	3	33.33%	[138, 139]
🔵	src/exprs/agg/group_concat.h	175	251	69.72%	[316, 358, 359, 363, 364, 367, 376, 377, 378, 379, 380, 391, 432, 435, 436, 437, 439, 440, 441, 444, 445, 450, 451, 452, 453, 456, 457, 459, 460, 461, 462, 465, 466, 467, 470, 471, 472, 473, 474, 516, 517, 520, 534, 594, 595, 601, 602, 632, 633, 643, 644, 645, 646, 647, 648, 649, 650, 653, 654, 655, 658, 664, 681, 688, 689, 697, 698, 699, 700, 701, 702, 704, 706, 707, 708, 716]
🔵	src/exec/aggregator.cpp	5	7	71.43%	[139, 140]
🔵	src/exprs/agg/factory/aggregate_factory.hpp	2	2	100.00%	[]
🔵	src/exprs/function_context.h	3	3	100.00%	[]
🔵	src/exprs/function_context.cpp	1	1	100.00%	[]
🔵	src/exprs/agg/factory/aggregate_resolver_others.cpp	1	1	100.00%	[]

fzhedu · 2023-08-24T06:42:21Z

@mergify backport branch-3.1

mergify · 2023-08-24T06:42:24Z

backport branch-3.1

✅ Backports have been created

#29867 [Feature] Group concat support order by and distinct (backport #28778) has been created for branch branch-3.1 but encountered conflicts

fzhedu · 2023-08-24T06:56:44Z

@mergify backport branch-3.0

mergify · 2023-08-24T06:56:47Z

backport branch-3.0

✅ Backports have been created

#29870 [Feature] Group concat support order by and distinct (backport #28778) has been created for branch branch-3.0 but encountered conflicts

fzhedu · 2023-08-24T06:56:52Z

@mergify backport branch-2.5

mergify · 2023-08-24T06:56:55Z

backport branch-2.5

✅ Backports have been created

#29871 [Feature] Group concat support order by and distinct (backport #28778) has been created for branch branch-2.5 but encountered conflicts

[Feature] Group concat support order by and distinct

[Feature] Group concat support order by and distinct Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

[Feature] Group concat support order by and distinct (backport #28778)

[Feature] Group concat support order by and distinct

[Feature] Group concat support order by and distinct Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

#29927) * Merge pull request #28778 from fzhedu/groupConcat [Feature] Group concat support order by and distinct Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

fzhedu requested a review from a team as a code owner August 7, 2023 14:58

mergify bot assigned fzhedu Aug 7, 2023

wanpengfei-git added the documentation Improvements or additions to documentation label Aug 9, 2023

fzhedu force-pushed the groupConcat branch from a7664ad to 0a4edcf Compare August 10, 2023 03:37

fzhedu commented Aug 15, 2023

View reviewed changes

LiShuMing reviewed Aug 15, 2023

View reviewed changes

packy92 reviewed Aug 15, 2023

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java Show resolved Hide resolved

fe/fe-core/src/main/java/com/starrocks/sql/analyzer/FunctionAnalyzer.java Show resolved Hide resolved

fe/fe-core/src/main/java/com/starrocks/sql/parser/StarRocksLex.g4 Show resolved Hide resolved

fzhedu force-pushed the groupConcat branch from 7245fa0 to 6208cde Compare August 15, 2023 07:00

Seaven reviewed Aug 17, 2023

View reviewed changes

fzhedu force-pushed the groupConcat branch from 0c011b9 to c3c65e2 Compare August 21, 2023 03:11

packy92 previously approved these changes Aug 21, 2023

View reviewed changes

fzhedu dismissed packy92’s stale review via 669b33a August 22, 2023 01:17

satanson previously approved these changes Aug 23, 2023

View reviewed changes

fzhedu added 15 commits August 23, 2023 11:11

[Feature] group_concat() support distinct and order by

e31fbe8

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

6f25c53

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

90cb92b

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

keep the same with mysql results

3865788

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update corner cases

948548e

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

refine

0464ade

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

487119d

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update according to comments

6c3f9e3

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

updated

7b33588

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

16eda12

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

fix a crash caused by agg output type is not the same with real outpu…

79d3361

…t col Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

164b8df

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update test keep order

3682d6d

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests

b7db0f7

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

update tests and remove some DCHECK

63decac

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

fzhedu added 2 commits August 23, 2023 11:13

fix unstable tests

fa551bc

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

add query_cache test

f39c077

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

fzhedu dismissed satanson’s stale review via f39c077 August 23, 2023 03:13

fzhedu force-pushed the groupConcat branch from 15e10de to f39c077 Compare August 23, 2023 03:13

fzhedu added 2 commits August 23, 2023 11:27

fix comments

bd17e8c

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

revert change unstable tests

1bc6b03

Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

github-actions bot added the behavior_changed label Aug 23, 2023

satanson approved these changes Aug 24, 2023

View reviewed changes

packy92 approved these changes Aug 24, 2023

View reviewed changes

fzhedu merged commit 34b655c into StarRocks:main Aug 24, 2023

mergify bot mentioned this pull request Aug 24, 2023

[Feature] Group concat support order by and distinct (backport #28778) #29867

Merged

This was referenced Aug 24, 2023

[Feature] Group concat support order by and distinct (backport #28778) #29870

Closed

[Feature] Group concat support order by and distinct (backport #28778) #29871

Closed

fzhedu added a commit to fzhedu/starrocks that referenced this pull request Aug 25, 2023

Merge pull request StarRocks#28778 from fzhedu/groupConcat

f9660eb

[Feature] Group concat support order by and distinct

fzhedu added a commit to fzhedu/starrocks that referenced this pull request Aug 25, 2023

Merge pull request StarRocks#28778 from fzhedu/groupConcat

d065faf

[Feature] Group concat support order by and distinct Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

fzhedu added a commit that referenced this pull request Aug 25, 2023

Merge pull request #29867 from StarRocks/mergify/bp/branch-3.1/pr-28778

31683a9

[Feature] Group concat support order by and distinct (backport #28778)

fzhedu added a commit to fzhedu/starrocks that referenced this pull request Aug 26, 2023

Merge pull request StarRocks#28778 from fzhedu/groupConcat

ad06975

[Feature] Group concat support order by and distinct

fzhedu added a commit to fzhedu/starrocks that referenced this pull request Aug 26, 2023

Merge pull request StarRocks#28778 from fzhedu/groupConcat

1bbfd84

[Feature] Group concat support order by and distinct Signed-off-by: Zhuhe Fang <fzhedu@gmail.com>

jaogoy requested a review from wangsimo0 September 13, 2023 12:03

liuyehcf mentioned this pull request Nov 30, 2023

[Enhancement] Use sql_mode to be compatible with legacy group_concat #36150

Merged

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Group concat support order by and distinct #28778

[Feature] Group concat support order by and distinct #28778

fzhedu commented Aug 7, 2023 •

edited by wanpengfei-git

Loading

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

LiShuMing Aug 15, 2023

fzhedu Aug 15, 2023

Seaven Aug 17, 2023

fzhedu Aug 17, 2023

wanpengfei-git commented Aug 21, 2023

fzhedu commented Aug 22, 2023

satanson Aug 23, 2023

fzhedu Aug 23, 2023

satanson Aug 23, 2023

fzhedu Aug 23, 2023

satanson Aug 23, 2023

fzhedu Aug 23, 2023

sonarqubecloud bot commented Aug 23, 2023

wanpengfei-git commented Aug 24, 2023

wanpengfei-git commented Aug 24, 2023

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 •

edited

Loading

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 •

edited

Loading

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 •

edited

Loading

[Feature] Group concat support order by and distinct #28778

[Feature] Group concat support order by and distinct #28778

Conversation

fzhedu commented Aug 7, 2023 • edited by wanpengfei-git Loading

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanpengfei-git commented Aug 21, 2023

[FE PR Coverage Check]

file detail

fzhedu commented Aug 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Aug 23, 2023

wanpengfei-git commented Aug 24, 2023

[FE Incremental Coverage Report]

file detail

wanpengfei-git commented Aug 24, 2023

[BE Incremental Coverage Report]

file detail

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 • edited Loading

✅ Backports have been created

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 • edited Loading

✅ Backports have been created

fzhedu commented Aug 24, 2023

mergify bot commented Aug 24, 2023 • edited Loading

✅ Backports have been created

fzhedu commented Aug 7, 2023 •

edited by wanpengfei-git

Loading

mergify bot commented Aug 24, 2023 •

edited

Loading

mergify bot commented Aug 24, 2023 •

edited

Loading

mergify bot commented Aug 24, 2023 •

edited

Loading