Skip to content

Conversation

@timsaucer
Copy link
Member

Which issue does this PR close?

Closes #11981

Rationale for this change

When performing a window function using a UDAF, which all of our aggregate functions are, causes an error in the physical plan. Specifically it generates a SchemaError::FieldNotFound. This appears to be because the DFSchema is empty in the minimal example showing in the bug report. However the ordering operation appears to already be handled by the window function call.

What changes are included in this PR?

Removes the generation of the ordering and sorting, which are essentially duplicates of each other, and only leaves the ordering parameters on the window function definition.

Are these changes tested?

All unit tests working.
The minimum demonstration of the error in the bug report now works.
Tested against my project where I initially discovered the problem.

Are there any user-facing changes?

None.

Additional Context

I must admit I don't fully understand the rationale behind setting the ordering/sort parameters on both the window function definition and also within the aggregate function. So this code change does resolve my immediate issue, but I am not 100% confident it will not have unintended side effects.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Aug 15, 2024
@timsaucer
Copy link
Member Author

@jayzhan211 Would you mind taking a look at this one? You recently worked on the impacted code and I could use a second opinion on this.

@jayzhan211
Copy link
Contributor

I think it makes sense to have either window or aggregate to bring along ordering information. Although I'm not sure who should take cares of at the end.

.schema(Arc::new(input_schema.clone()))
.alias(name)
.order_by(order_by.to_vec())
.sort_exprs(sort_exprs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_exprs is removed. I think we could take a look again with the latest main branch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I rebased on main and ended up getting a different error, but I still think we need a correction. I think I'll write up a unit test as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess after we introduce WindowExprBuilder, we could have a nice builder for createing window aggregate function with single order_by.

@timsaucer timsaucer force-pushed the bugfix/order-on-window-aggregate-fns branch from 5eb0bea to bcf6f5a Compare August 15, 2024 10:13
@github-actions github-actions bot added the core Core DataFusion crate label Aug 15, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211
Copy link
Contributor

Thanks @timsaucer

@jayzhan211 jayzhan211 merged commit 9d1cf74 into apache:main Aug 15, 2024
@timsaucer timsaucer deleted the bugfix/order-on-window-aggregate-fns branch October 10, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

order_by not respected for window functions using udaf

2 participants