-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
After upgrading from DataFusion 47 to a newer version I started seeing schema mismatch errors caused by updated array type coercion logic that does not preserve nullability information for nested types.
SELECT offset[2]-offset[1] FROM rd;
Arrow error: Invalid argument error: column types must match schema types, expected List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0
To Reproduce
The following unit test can be used to verify this behavior.
assertion
left == rightfailed
left: [[List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]]
right: [[List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })]]
stack backtrace:
fn test_get_valid_types_fixed_size_arrays() -> Result<()> {
let function = "fixed_size_arrays";
let signature = Signature::arrays(2, None, Volatility::Immutable);
let data_types = vec![
DataType::new_fixed_size_list(DataType::Int64, 3, false),
DataType::new_list(DataType::Int32, false),
];
assert_eq!(
get_valid_types(function, &signature.type_signature, &data_types)?,
vec![vec![
DataType::new_list(DataType::Int64, false),
DataType::new_list(DataType::Int64, false),
]]
);
Ok(())
}This can also be observed by adding additional tracing into coerce_arguments_for_signature_with_scalar_udf. Observe data_type: Int32, nullable: false has changed to data_type: Int32, nullable: true in coerced type.
/// Returns `expressions` coerced to types compatible with
/// `signature`, if possible.
///
/// See the module level documentation for more detail on coercion.
fn coerce_arguments_for_signature_with_scalar_udf(
expressions: Vec<Expr>,
schema: &DFSchema,
func: &ScalarUDF,
) -> Result<Vec<Expr>> {
if expressions.is_empty() {
return Ok(expressions);
}
let current_types = expressions
.iter()
.map(|e| e.get_type(schema))
.collect::<Result<Vec<_>>>()?;
let new_types = data_types_with_scalar_udf(¤t_types, func)?;
println!("schema: {:?}", schema);
println!("current_types: {:?}", current_types);
println!("Coerced types: {:?}", new_types);
expressions
.into_iter()
.enumerate()
.map(|(i, expr)| expr.cast_to(&new_types[i], schema))
.collect()
}schema: DFSchema { inner: Schema { fields: [Field { name: "offset", data_type: FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"content_computed_columns": "content_embedding,content_offset"} }, field_qualifiers: [Some(Bare { table: "rd" })], functional_dependencies: FunctionalDependencies { deps: [] } }
current_types: [FixedSizeList(Field { name: "item", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2), Int64]
Coerced types: [List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), Int64]Expected behavior
No response
Additional context
The original (correct) behavior was changed by the following improvement:
#15149 (comment)