Skip to content

Proper NULL handling in array functions #14451

@jkosh44

Description

@jkosh44

Describe the bug

The following array functions do not properly handle NULL input, they either return an error or panic (non-exhaustive):

  • array_sort
  • array_replace
  • array_replace_all
  • array_replace_n
  • array_resize

Discovered in #14289

To Reproduce

Scalar input

> SELECT array_sort([1,2,3], NULL);
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

> SELECT array_resize([1,2,3], NULL, 0);
Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

> SELECT array_replace([1,2,3], NULL, NULL);
thread 'main' panicked at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13:
assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData
  left: Int64
 right: Null
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
   2: core::panicking::assert_failed_inner
   3: core::panicking::assert_failed
             at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5
   4: arrow_data::transform::MutableArrayData::with_capacities
             at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-54.0.0/src/transform/mod.rs:418:13
   5: datafusion_functions_nested::replace::general_replace
             at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:303:23
   6: datafusion_functions_nested::replace::array_replace_inner
             at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:394:13
   7: core::ops::function::Fn::call
             at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:79:5
   8: datafusion_functions_nested::utils::make_scalar_function::{{closure}}
             at /home/joe/Projects/datafusion/datafusion/functions-nested/src/utils.rs:72:22
   9: <datafusion_functions_nested::replace::ArrayReplace as datafusion_expr::udf::ScalarUDFImpl>::invoke_batch
             at /home/joe/Projects/datafusion/datafusion/functions-nested/src/replace.rs:123:9
  10: datafusion_expr::udf::ScalarUDFImpl::invoke_with_args
             at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:643:9
  11: datafusion_expr::udf::ScalarUDF::invoke_with_args
             at /home/joe/Projects/datafusion/datafusion/expr/src/udf.rs:237:9
  12: <datafusion_physical_expr::scalar_function::ScalarFunctionExpr as datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate
             at /home/joe/Projects/datafusion/datafusion/physical-expr/src/scalar_function.rs:195:22
  13: datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar
             at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:639:29
  14: <datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator as datafusion_common::tree_node::TreeNodeRewriter>::f_up
             at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:529:30
  15: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}::{{closure}}
             at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:183:13
  16: datafusion_common::tree_node::Transformed<T>::transform_parent
             at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:763:44
  17: datafusion_common::tree_node::TreeNode::rewrite::{{closure}}
             at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:28:9
  18: stacker::maybe_grow
             at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:55:9
  19: datafusion_common::tree_node::TreeNode::rewrite
             at /home/joe/Projects/datafusion/datafusion/common/src/tree_node.rs:177:50
  20: datafusion_optimizer::simplify_expressions::expr_simplifier::ExprSimplifier<S>::simplify_with_cycle_count
             at /home/joe/Projects/datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:211:17
  ...
  71: tokio::runtime::runtime::Runtime::block_on
             at /home/joe/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.43.0/src/runtime/runtime.rs:340:13
  72: datafusion_cli::main
             at ./src/main.rs:131:5
  73: core::ops::function::FnOnce::call_once
             at /home/joe/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
killed

Process finished with exit code 9

Array Input

> CREATE TABLE t (a int[], b int, c int);
0 row(s) fetched. 
Elapsed 0.004 seconds.

> INSERT INTO t VALUES ([1,2,3], 1, 2), (NULL, 1, 2), ([1,2,3], NULL, 2), ([1,2,3], 1, NULL);
+-------+
| count |
+-------+
| 4     |
+-------+
1 row(s) fetched. 
Elapsed 0.006 seconds.

> SELECT array_sort(a, b, c) FROM t;
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
> SELECT array_resize(a, b, c) FROM t;
Internal error: could not cast value to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Int64Type>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected behavior

The functions should either return NULL or correctly handle the NULL input without panicking.

Additional context

I briefly investigated the functions and found the following:

array_sort

To fix this function we can add NullHandling::Propagate to theimpl ScalarUDFImpl for ArraySort. Then we also have to update array_sort_inner to return NULL on NULL inputs.

array_resize

I haven't looked into this one yet, but I think the solution is probably similar to array_sort.

array_replace

Note this applies to array_replace_all and array_replace_n.

This function actually does not want to propagate NULLs, it should be able to find NULL elements and replace with NULL elements. When the inputs are ArrayRefs then actually everything works correctly, it's only when the inputs are Scalar that things break. The issue is that the functions accepts DataType::Null instead of a null value for the list element type. Then we get type errors when trying to use the DataType::Null with a typed list.

I believe if we updated the signature from Signature::any(3, Volatility::Immutable) to something that includes type information then we would fix the panic. However, we face a familiar problem that there is no way to represent (list, element-type, element-type, i64) with the current Signature struct.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions