Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate builtin first/last aggregate function and use UDAF #10091

Closed
wants to merge 105 commits into from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Apr 15, 2024

Which issue does this PR close?

Part of #8708
Closes #9957 #10062

Rationale for this change

What changes are included in this PR?

Additionally, check the signature for UDAF, and extend it to support List.
Window Aggregate function also works with UDAF surprisingly without any change about the window function

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 15, 2024
@jayzhan211 jayzhan211 changed the title Deprecate builtin first/last aggregate function Deprecate builtin first/last aggregate function and use UDAF Apr 15, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
},
});
} else if let Expr::Alias(alias) = expr.as_ref() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #10074 to see how to arrange the code here, replace with create_physical_sort_expr or others

let mut sort_exprs = vec![];
for expr in acc_args.sort_exprs {
if let Expr::Sort(sort) = expr {
if let Expr::Column(col) = sort.expr.as_ref() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LastValue has no test case about Alias, so I keep it simple first.

@@ -92,14 +90,13 @@ impl OptimizerRule for ReplaceDistinctWithAggregate {
let aggr_expr = select_expr
.iter()
.map(|e| {
Expr::AggregateFunction(AggregateFunction::new(
AggregateFunctionFunc::FirstValue,
create_first_value_expr(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument support in macro is another todo.

            // "fluent expr_fn" style function
            #[doc = $DOC]
            pub fn $EXPR_FN($($arg: Expr),*) -> Expr {
                Expr::AggregateFunction(datafusion_expr::expr::AggregateFunction::new_udf(
                    $AGGREGATE_UDF_FN(),
                    vec![$($arg),*],
                    // TODO: Support arguments for `expr` API
                    false,
                    None,
                    None,
                    None,
                ))
            }

@jayzhan211 jayzhan211 marked this pull request as ready for review April 16, 2024 02:41
@alamb
Copy link
Contributor

alamb commented Apr 17, 2024

I plan to review this tomorrow (sorry I am out this week for vacation so I have more limited bandwidth)

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

if aggr_fun_expr.is_some_and(|e| e.fun().name() == "FIRST_VALUE") {
let mut first_value = aggr_fun_expr.unwrap().clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where derive(Clone) is needed

}

fn reverse_expr(&self) -> Option<Arc<dyn AggregateExpr>> {
Some(Arc::new(self.clone().convert_to_last()))
Copy link
Contributor Author

@jayzhan211 jayzhan211 Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that reverse_expr is removed and manually converted in OptimizeAggregateOrder, but in Window function, get_best_fitting_window may call reverse_expr of first/last but no test is covered

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

I think reverse_expr should be included before deprecate builtint first/last, so convert to draft.

@jayzhan211 jayzhan211 marked this pull request as draft April 20, 2024 07:53
@jayzhan211 jayzhan211 closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove builtin aggregate function FirstValue
2 participants