Keep aggregate udaf schema names unique when missing an order-by #17731

wiedld · 2025-09-22T23:22:57Z

Which issue does this PR close?

Closes approx_percentile_cont query fails: "Schema contains duplicate unqualified field name #17730

Rationale for this change

The schema names are kept unique using the WITHIN GROUP clause. But since we permit this clause to be missing, we need to maintain uniqueness by not shaving off the column identifier. See test case for details.

Also, note the commit with initial failure: influxdata@5cbeb1c

What changes are included in this PR?

Adds to a conditional.

Are these changes tested?

Yes.

Are there any user-facing changes?

Fixes a regression.

wiedld · 2025-09-22T23:59:19Z

Failing test is due to github flake. If someone could please initiate a restart? 🙏🏼

alamb

Thank you @wiedld -- this clearly fixes a bug so looks good to me

I am somewhat confused about how order_by can be empty in the first place, however (see my other comments)

alamb · 2025-09-23T12:41:57Z

datafusion/sqllogictest/test_files/aggregate.slt

 e 115

+
+# using approx_percentile_cont on 2 columns with same signature


I verified this test fails as follows without the code change

1. query failed: DataFusion error: Schema error: Schema contains duplicate unqualified field name "approx_percentile_cont(Float64(0.95))" [SQL] SELECT c1, approx_percentile_cont(c2, 0.95) AS c2, approx_percentile_cont(c3, 0.95) AS c3 FROM aggregate_test_100 GROUP BY 1 ORDER BY 1 at /Users/andrewlamb/Software/datafusion/datafusion/sqllogictest/test_files/aggregate.slt:1840

alamb · 2025-09-23T12:52:42Z

datafusion/expr/src/udaf.rs

    // exclude the first function argument(= column) in ordered set aggregate function,
    // because it is duplicated with the WITHIN GROUP clause in schema name.
-    let args = if func.is_ordered_set_aggregate() {
+    let args = if func.is_ordered_set_aggregate() && !order_by.is_empty() {


This clearly fixes a bug so I think we can merge this. However, it seems somewhat strange to me given that my understanding of ordered set aggregate functions is that they require an ORDER BY clause 🤔

I wonder if this PR is just fixing a symptom, and the root cause is that approx_percentile_cont should actually be created with an order by expression 🤔

I tried to update the docs here

Improve documentation for ordered set aggregate functions #17744

I believe this is because we are currently supporting the corrected syntax (which includes an order-by), as well as the older syntax which does not include an order-by in the UDAF clause itself.

Docs: https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#approx-percentile-cont

New, corrected syntax:
SELECT approx_percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) FROM table_name;

Older syntax, still supported:
SELECT approx_percentile_cont(column_name, 0.75) FROM table_name;

Note that this bug occurred with customers using the older syntax, since it was shortening the schema names in such a way that they were non-unique is only the 1st argument was different.

wiedld added 3 commits September 22, 2025 19:20

test: reproducer of bug

5cbeb1c

fix: make schema names unique for approx_percentile_cont

b4c633f

test: regression test is now resolved

a01f8f6

github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Sep 22, 2025

wiedld changed the title ~~Keep approx_percentile_cont names unique when missing an order-by~~ Keep approx_percentile_cont schema names unique when missing an order-by Sep 22, 2025

wiedld mentioned this pull request Sep 22, 2025

approx_percentile_cont query fails: "Schema contains duplicate unqualified field name #17730

Open

alamb changed the title ~~Keep approx_percentile_cont schema names unique when missing an order-by~~ Keep aggregate udaf schema names unique when missing an order-by Sep 23, 2025

alamb mentioned this pull request Sep 23, 2025

Improve documentation for ordered set aggregate functions #17744

Merged

alamb approved these changes Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep aggregate udaf schema names unique when missing an order-by #17731

Keep aggregate udaf schema names unique when missing an order-by #17731

wiedld commented Sep 22, 2025 •

edited by alamb

Loading

Uh oh!

wiedld commented Sep 22, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Sep 23, 2025

Uh oh!

alamb Sep 23, 2025

Uh oh!

wiedld Sep 24, 2025 •

edited

Loading

Uh oh!

wiedld Sep 24, 2025

Uh oh!

Uh oh!

		e 115


		# using approx_percentile_cont on 2 columns with same signature

Keep aggregate udaf schema names unique when missing an order-by #17731

Are you sure you want to change the base?

Keep aggregate udaf schema names unique when missing an order-by #17731

Conversation

wiedld commented Sep 22, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

wiedld commented Sep 22, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

wiedld Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wiedld Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wiedld commented Sep 22, 2025 •

edited by alamb

Loading

wiedld Sep 24, 2025 •

edited

Loading