-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
The following array functions do not properly handle NULL input, they either return an error or panic (non-exhaustive):
array_sortarray_replacearray_replace_allarray_replace_narray_resize
Discovered in #14289
To Reproduce
Scalar input
> SELECT array_sort([1,2,3], NULL);
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
> SELECT array_resize([1,2,3], NULL, 0);
Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
> SELECT array_replace([1,2,3], NULL, NULL);
thread 'main' panicked at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13:
assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData
left: Int64
right: Null
stack backtrace:
0: rust_begin_unwind
at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
1: core::panicking::panic_fmt
at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5
4: arrow_data::transform::MutableArrayData::with_capacities
at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13
5: datafusion_functions_nested::replace::general_replace
at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:303:23
6: datafusion_functions_nested::replace::array_replace_inner
at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:394:13
7: core::ops::function::Fn::call
at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5
8: datafusion_functions_nested::utils::make_scalar_function::{{closure}}
at /home/joe/Projects/datafusion/datafusion/functions-nested/src/utils.rs:72:22
9: <datafusion_functions_nested::replace::ArrayReplace as datafusion_expr::udf::ScalarUDFImpl>::invoke_batch
at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:123:9
10: datafusion_expr::udf::ScalarUDFImpl::invoke_with_args
at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:643:9
11: datafusion_expr::udf::ScalarUDF::invoke_with_args
at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:237:9
12: <datafusion_physical_expr::scalar_function::ScalarFunctionExpr as datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate
at /home/joe/Projects/datafusion/datafusion/physical-expr/src/scalar_function.rs:195:22
13: datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar
at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:639:29
14: <datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator as datafusion_common::tree_node::TreeNodeRewriter>::f_up
at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:529:30
15: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}::{{closure}}
at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:183:13
16: datafusion_common::tree_node::Transformed<T>::transform_parent
at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:763:44
17: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}
at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:28:9
18: stacker::maybe_grow
at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:55:9
19: datafusion_common::tree_node::TreeNode::rewrite
at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:177:50
20: datafusion_optimizer::simplify_expressions::expr_simplifier::ExprSimplifier<S>::simplify_with_cycle_count
at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:211:17
...
71: tokio::runtime::runtime::Runtime::block_on
at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:340:13
72: datafusion_cli::main
at ./src/main.rs:131:5
73: core::ops::function::FnOnce::call_once
at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
killed
Process finished with exit code 9
Array Input
> CREATE TABLE t (a int[], b int, c int);
0 row(s) fetched.
Elapsed 0.004 seconds.
> INSERT INTO t VALUES ([1,2,3], 1, 2), (NULL, 1, 2), ([1,2,3], NULL, 2), ([1,2,3], 1, NULL);
+-------+
| count |
+-------+
| 4 |
+-------+
1 row(s) fetched.
Elapsed 0.006 seconds.
> SELECT array_sort(a, b, c) FROM t;
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
> SELECT array_resize(a, b, c) FROM t;
Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Expected behavior
The functions should either return NULL or correctly handle the NULL input without panicking.
Additional context
I briefly investigated the functions and found the following:
array_sort
To fix this function we can add NullHandling::Propagate to theimpl ScalarUDFImpl for ArraySort. Then we also have to update array_sort_inner to return NULL on NULL inputs.
array_resize
I haven't looked into this one yet, but I think the solution is probably similar to array_sort.
array_replace
Note this applies to array_replace_all and array_replace_n.
This function actually does not want to propagate NULLs, it should be able to find NULL elements and replace with NULL elements. When the inputs are ArrayRefs then actually everything works correctly, it's only when the inputs are Scalar that things break. The issue is that the functions accepts DataType::Null instead of a null value for the list element type. Then we get type errors when trying to use the DataType::Null with a typed list.
I believe if we updated the signature from Signature::any(3, Volatility::Immutable) to something that includes type information then we would fix the panic. However, we face a familiar problem that there is no way to represent (list, element-type, element-type, i64) with the current Signature struct.