Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The SortExec is missing from the physical plan of string_agg functions. In the current branch it works since the AggregateExec ends up sorting the results, but on 49.0.0 its broken. This PR helps avoid future regressions by implementing AggregateUDFImpl::reverse_expr for StringAgg, which brings back the SortExec.

What changes are included in this PR?

  • Implemented AggregateUDFImpl::reverse_expr for StringAgg.
  • Added sqllogictests to check the plans.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Aug 13, 2025
03)----DataSourceExec: partitions=1, partition_sizes=[1]

query TT
explain select string_agg(k, ',' order by v desc) from t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is single aggregation enough to exercise the reverse_expr API?
Or maybe we need two functions, one using reverse ordering of the other?

I checked that the test does not pass without changes in string_agg.rs and passes with them.
This is likely related to

if let Some(reverse_aggr_expr) = aggr_expr.reverse_expr() {
call site, but it looks it should never matter when there is only one function.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reverse_expr the requirement in that function will stay None, which means get_finer_aggregate_exprs_requirement returns an empty list. Maybe @berkaysynnada can clarify the behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we hitting this loop exit path as expected?

// If the common requirement is finer than the current expression's,
// we can skip this expression. If the latter is finer than the former,
// adopt it if it is satisfied by the equivalence properties. Otherwise,
// defer the analysis to the reverse expression.
let forward_finer = determine_finer(&requirement, &aggr_req);
if let Some(finer) = forward_finer {
if !finer {
continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it, determine_finer returns true. Should it exit there? If it did I think we would have the same problem, i.e., requirement would remain None.

@findepi findepi requested a review from berkaysynnada August 13, 2025 10:08
@findepi
Copy link
Member

findepi commented Aug 13, 2025

This LGTM but i clearly don't understand fully how this works.
In my understanding the reverse_expr should by no means be needed, nor used, if there is only one aggregate function in a query. Either I am wrong, or there is some other problem remaining.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

I added a test using string_agg with both orderings in 84d58ac and it looks good to me

Thank you @nuno-faria and @findepi

@alamb
Copy link
Contributor

alamb commented Sep 8, 2025

I believe this should be included in DataFusion 50 as well so that we don't have a regression after DF 49. I will merge it now and make a backport

@alamb alamb merged commit b084aa4 into apache:main Sep 8, 2025
28 checks passed
alamb added a commit to alamb/datafusion that referenced this pull request Sep 8, 2025
…7165)

* fix: Implement AggregateUDFImpl::reverse_expr for StringAgg

* Add a test with two invocations of aggregateion

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@nuno-faria nuno-faria deleted the string_agg_reverse_expr branch September 8, 2025 12:45
alamb added a commit that referenced this pull request Sep 8, 2025
…17473)

* fix: Implement AggregateUDFImpl::reverse_expr for StringAgg

* Add a test with two invocations of aggregateion

---------

Co-authored-by: Nuno Faria <nunofpfaria@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants