Skip to content

Conversation

@alan910127
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This change ensures that array functions handle NULL and incorrect argument types gracefully, preventing unexpected failures or internal errors showing up to the user.

What changes are included in this PR?

Added ArrayFunctionArgument::DataType(data_type) variant for array functions to define an argument of a specific type.

array_sort

  • Returns NULL when either desc or nulls_first is NULL.
  • There's no incorrect input type since every type can be coerced into Utf8

NULL inputs:

> select array_sort([1,2,3]);
+----------------------------------------------------+
| array_sort(make_array(Int64(1),Int64(2),Int64(3))) |
+----------------------------------------------------+
| [1, 2, 3]                                          |
+----------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

> select array_sort([1,2,3], NULL);
+---------------------------------------------------------+
| array_sort(make_array(Int64(1),Int64(2),Int64(3)),NULL) |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

> select array_sort([1,2,3], NULL, 'NULLS FIRST');
+-----------------------------------------------------------------------------+
| array_sort(make_array(Int64(1),Int64(2),Int64(3)),NULL,Utf8("NULLS FIRST")) |
+-----------------------------------------------------------------------------+
| NULL                                                                        |
+-----------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

> select array_sort([1,2,3], 'DESC', NULL);
+----------------------------------------------------------------------+
| array_sort(make_array(Int64(1),Int64(2),Int64(3)),Utf8("DESC"),NULL) |
+----------------------------------------------------------------------+
| NULL                                                                 |
+----------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

> select array_sort([1,2,3], NULL, NULL);
+--------------------------------------------------------------+
| array_sort(make_array(Int64(1),Int64(2),Int64(3)),NULL,NULL) |
+--------------------------------------------------------------+
| NULL                                                         |
+--------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

Incorrect type of inputs:
N/A

array_replace, array_replace_n, array_replace_all

  • Treat NULLs as elements.
  • Internal coercion errors are not handled yet. Don't know if it should be handled in this PR

NULL inputs:

> select array_replace([1,2,3,NULL],NULL,4);
+--------------------------------------------------------------------------+
| array_replace(make_array(Int64(1),Int64(2),Int64(3),NULL),NULL,Int64(4)) |
+--------------------------------------------------------------------------+
| [1, 2, 3, 4]                                                             |
+--------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

> select array_replace([1,2,3,NULL],2,NULL);
+--------------------------------------------------------------------------+
| array_replace(make_array(Int64(1),Int64(2),Int64(3),NULL),Int64(2),NULL) |
+--------------------------------------------------------------------------+
| [1, NULL, 3, NULL]                                                       |
+--------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

incorrect type of inputs:

> select array_replace([1,2,3], 1, true);
Error during planning: Internal error: Function 'array_replace' does not support coercion from Int64 to Boolean.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker No function matches the given name and argument types 'array_replace(List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64, Boolean)'. You might need to add explicit type casts.
        Candidate functions:
        array_replace(array, element, element)

array_resize

  • If size is NULL, returns an empty array. If value is NULL, fill the new slots to be NULL.
  • Incorrect types of arguments returns plan errors instead of internal errors.

NULL inputs:

> select array_resize([1,2,3], NULL);
+-----------------------------------------------------------+
| array_resize(make_array(Int64(1),Int64(2),Int64(3)),NULL) |
+-----------------------------------------------------------+
| []                                                        |
+-----------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

> select array_resize([1,2,3], NULL, NULL);
+----------------------------------------------------------------+
| array_resize(make_array(Int64(1),Int64(2),Int64(3)),NULL,NULL) |
+----------------------------------------------------------------+
| []                                                             |
+----------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select array_resize([1,2,3], 5, NULL);
+--------------------------------------------------------------------+
| array_resize(make_array(Int64(1),Int64(2),Int64(3)),Int64(5),NULL) |
+--------------------------------------------------------------------+
| [1, 2, 3, NULL, NULL]                                              |
+--------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

incorrect type of inputs:

> select array_resize([1,2,3], 'invalid', NULL);
Error during planning: Failed to coerce arguments to satisfy a call to 'array_resize' function: coercion from [List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Utf8, Null] to the signature OneOf([ArraySignature(Array { arguments: [Array, Index], array_coercion: Some(FixedSizedListToList) }), ArraySignature(Array { arguments: [Array, Index, Element], array_coercion: Some(FixedSizedListToList) })]) failed No function matches the given name and argument types 'array_resize(List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Utf8, Null)'. You might need to add explicit type casts.
        Candidate functions:
        array_resize(array, index)
        array_resize(array, index, element)
> select array_resize([1,2,3], 5, true);
Error during planning: Failed to coerce arguments to satisfy a call to 'array_resize' function: coercion from [List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64, Boolean] to the signature OneOf([ArraySignature(Array { arguments: [Array, Index], array_coercion: Some(FixedSizedListToList) }), ArraySignature(Array { arguments: [Array, Index, Element], array_coercion: Some(FixedSizedListToList) })]) failed No function matches the given name and argument types 'array_resize(List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64, Boolean)'. You might need to add explicit type casts.
        Candidate functions:
        array_resize(array, index)
        array_resize(array, index, element)

Are these changes tested?

Tests for array_sort with NULL and incorrect inputs are added in array.slt. Other functions are already tested via arrays_values in sqllogictest. If additional tests are needed, please let me know.

Are there any user-facing changes?

No.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Feb 18, 2025
@alan910127 alan910127 force-pushed the fix/array-fn-null-handling branch from a108022 to 3f07b44 Compare February 18, 2025 06:35
@alan910127
Copy link
Contributor Author

alan910127 commented Feb 24, 2025

the tests failed because the error message didn’t match. It looks like the pipeline outputs more than we expected. I temporarily removed the error message check in array.slt to avoid mismatches, not sure if there’s a better approach. (but that does not explain why the tests didn't fail in fa3fb27)

@alamb
Copy link
Contributor

alamb commented Mar 3, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft March 3, 2025 21:41
@alan910127 alan910127 requested a review from jayzhan211 March 5, 2025 08:03
@alan910127 alan910127 marked this pull request as ready for review March 5, 2025 08:04
for i in 0..row_count {
if list_array.is_null(i) {
array_lengths.push(0);
arrays.push(new_null_array(list_array.value(i).data_type(), 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line fix the issue below.

for null case, you just need to make sure the offset append a single 0 and the null buffer append a null, that is why we array_lengths.push(0) and append null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thank you so much! I think I did it wrong when dealing with array_sort(NULL). I've pushed a new version, please take a look to check if that's the correct implementation.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@alan910127
Copy link
Contributor Author

@jayzhan211 Thank you so much for all the help and guidance!

@jayzhan211 jayzhan211 merged commit 43ecd9b into apache:main Mar 6, 2025
24 checks passed
@jayzhan211
Copy link
Contributor

Thanks @alan910127

@alan910127 alan910127 deleted the fix/array-fn-null-handling branch March 6, 2025 00:40
danila-b pushed a commit to danila-b/datafusion that referenced this pull request Mar 8, 2025
…#14737)

* feat: arbitrary typed argument in array function

* fix: array_sort null handling

* fix: array_resize signature

* test: add array_sort sqllogictest for null and invalid types

* fix: don't match error message

* chore: use string instead of data type

* refactor: use new_null_array

* fix: pass null to array argument should return null

* fix: handle null argument for array in replace and resize

* fix: mismatched error message

* fix: incorrect number of rows returned

* test: update null tests

* fix: treat NULLs as lists directly to prevent extra handling

* fix: incorrect null pushing in array_sort
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Mar 19, 2025
…#14737)

* feat: arbitrary typed argument in array function

* fix: array_sort null handling

* fix: array_resize signature

* test: add array_sort sqllogictest for null and invalid types

* fix: don't match error message

* chore: use string instead of data type

* refactor: use new_null_array

* fix: pass null to array argument should return null

* fix: handle null argument for array in replace and resize

* fix: mismatched error message

* fix: incorrect number of rows returned

* test: update null tests

* fix: treat NULLs as lists directly to prevent extra handling

* fix: incorrect null pushing in array_sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proper NULL handling in array functions

3 participants