Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support internal cast for BuiltinScalarFunction::MakeArray #6607

Merged
merged 7 commits into from
Jun 14, 2023

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #6558.

Rationale for this change

BuiltinScalarFunction::MakeArray with different kinds of DataType is not supported yet

What changes are included in this PR?

BuiltinScalarFunction::MakeArray use array() in datafusion/physical-expr/src/array_expressions.rs, we support casting internally in array_array().

Type coercion is based on datafusion_expr::comparison_coercion
Cast is based on arrow-cast

Are these changes tested?

unit tests are added
sqllogictest for float is added

make_array(Int64(1), Int32(2)) is not included in sqllogictest, since Int64() is not supported yet and not trivial for me to support in this PR, also no other way to differentiate value from int64 and int32.

External error: query failed: DataFusion error: Error during planning: Invalid function 'int64'.
Did you mean 'atan'?
[SQL] select make_array(Int64(1))
at tests/sqllogictests/test_files/array.slt:213

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jun 9, 2023
@jayzhan211 jayzhan211 changed the title Array cast Support internal cast for BuiltinScalarFunction::MakeArray Jun 9, 2023
@github-actions github-actions bot added optimizer Optimizer rules sql SQL Planner substrait labels Jun 12, 2023
@github-actions github-actions bot removed optimizer Optimizer rules substrait sql SQL Planner labels Jun 12, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jun 12, 2023

@izveigor @alamb Ready for review, thanks

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I left some comments

@@ -85,7 +87,18 @@ fn array_array(args: &[ArrayRef]) -> Result<ArrayRef> {
));
}

let data_type = args[0].data_type();
let data_type = args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are trying to avoid casting in the physical-exprs and instead want to cast in the analyze pass (so the rest of the optimizer passes see the correct types)

I think this is the relevant code:

https://github.com/apache/arrow-datafusion/blob/071a2a6dfd43b3b225fce1af8838fa95c79f9ede/datafusion/optimizer/src/analyzer/type_coercion.rs#L389-L397

Perhaps it needs to be taught about array arguments 🤔

@jayzhan211 jayzhan211 marked this pull request as draft June 13, 2023 09:15
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added optimizer Optimizer rules and removed physical-expr Physical Expressions labels Jun 13, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Jun 13, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review June 13, 2023 12:17
@jayzhan211 jayzhan211 marked this pull request as draft June 13, 2023 12:46
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review June 13, 2023 12:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this looks good to me

Any thoughts @izveigor ?

@@ -602,6 +603,39 @@ fn coerce_arguments_for_signature(
.collect::<Result<Vec<_>>>()
}

fn coerce_arguments_for_fun(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@izveigor izveigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If you have more time, you can add more sql logic tests.

@alamb
Copy link
Contributor

alamb commented Jun 13, 2023

LGTM. If you have more time, you can add more sql logic tests.

Specifically, I think adding coverage for NULL and arrays would be helpful:

select make_array(null, 1.0)
select make_array(1.0, '2', null)
create table foo (x int, y double) as values (1, 2.0);
select make_array(x, y) from foo;

@alamb
Copy link
Contributor

alamb commented Jun 13, 2023

I plan to merge this PR tomorrow, unless @jayzhan211 would like more time to add tests

@jayzhan211
Copy link
Contributor Author

Let me add some tests

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @jayzhan211

@alamb alamb merged commit e9fae98 into apache:main Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cast between array elements
3 participants