-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: failed to create ValuesExec with non-nullable schema #8776
Conversation
@@ -71,7 +69,7 @@ impl ValuesExec { | |||
match r { | |||
Ok(ColumnarValue::Scalar(scalar)) => Ok(scalar), | |||
Ok(ColumnarValue::Array(a)) if a.len() == 1 => { | |||
Ok(ScalarValue::List(a)) | |||
ScalarValue::try_from_array(&a, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning ScalarValue::List(a)
makes the following query fail.
DataFusion CLI v34.0.0
❯ select * from (values(random()));
Internal error: Inconsistent types in ScalarValue::iter_to_array. Expected Float64, got List(0.5174512019067328).
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Related to pr #7629. I think it's a bug, so I fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @jonahgao
@@ -114,6 +114,14 @@ VALUES (1,2,3,4,5,6,7,8,9,10,11,12,13,NULL,'F',3.5) | |||
---- | |||
1 2 3 4 5 6 7 8 9 10 11 12 13 NULL F 3.5 | |||
|
|||
# Test non-literal expressions in VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
.iter() | ||
.map(|field| new_null_array(field.data_type(), 1)) | ||
.collect::<Vec<_>>(), | ||
// we have this single row batch as a placeholder to satisfy evaluation argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this code to use a null array probably pre-dates the ability to create zero column record batches
Also thank you for:
Very much appreciated |
Thank you for reviewing @alamb . |
Which issue does this PR close?
Closes #8763.
Rationale for this change
In
ValuesExec::try_new
, we use the schema ofValuesExec
and null arrays to construct a placeholder batch.https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/physical-plan/src/values.rs#L56-L64
This batch placeholder is used to evaluate the physical expressions in Values.
But users can define a schema containing non-nullable fields for
ValuesExec
just like issue #8763, which is in conflict with null arrays.In this case, an ArrowError ‘InvalidArgumentError("Column 'a' is declared as non-nullable but contains null values"))' will be raised.
Since
ValuesExec
has no input, I think the schema for this placeholder batch can be empty, rather than using the schema fromValuesExec
.What changes are included in this PR?
Use correct schema to build the placeholder batch in
ValuesExec::try_new
.Are these changes tested?
Yes
Are there any user-facing changes?
No