Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: failed to create ValuesExec with non-nullable schema #8776

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

jonahgao
Copy link
Member

@jonahgao jonahgao commented Jan 7, 2024

Which issue does this PR close?

Closes #8763.

Rationale for this change

In ValuesExec::try_new, we use the schema of ValuesExec and null arrays to construct a placeholder batch.
https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/physical-plan/src/values.rs#L56-L64
This batch placeholder is used to evaluate the physical expressions in Values.

But users can define a schema containing non-nullable fields for ValuesExec just like issue #8763, which is in conflict with null arrays.

let field = Field::new("a", DataType::Int32, false);
let schema = Schema::new(vec![field]);
let df_schema = DFSchema::try_from(schema.clone()).unwrap();
let values = vec![vec![Expr::Literal(ScalarValue::Int32(Some(1)))]];
let values_plan = LogicalPlan::Values(Values {
        schema: df_schema.clone().into(),
        values: values.clone(),
})

In this case, an ArrowError ‘InvalidArgumentError("Column 'a' is declared as non-nullable but contains null values"))' will be raised.

Since ValuesExec has no input, I think the schema for this placeholder batch can be empty, rather than using the schema from ValuesExec.

What changes are included in this PR?

Use correct schema to build the placeholder batch in ValuesExec::try_new.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 7, 2024
@@ -71,7 +69,7 @@ impl ValuesExec {
match r {
Ok(ColumnarValue::Scalar(scalar)) => Ok(scalar),
Ok(ColumnarValue::Array(a)) if a.len() == 1 => {
Ok(ScalarValue::List(a))
ScalarValue::try_from_array(&a, 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning ScalarValue::List(a) makes the following query fail.

DataFusion CLI v34.0.0
❯ select * from (values(random()));
Internal error: Inconsistent types in ScalarValue::iter_to_array. Expected Float64, got List(0.5174512019067328).
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Related to pr #7629. I think it's a bug, so I fixed it.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @jonahgao

@@ -114,6 +114,14 @@ VALUES (1,2,3,4,5,6,7,8,9,10,11,12,13,NULL,'F',3.5)
----
1 2 3 4 5 6 7 8 9 10 11 12 13 NULL F 3.5

# Test non-literal expressions in VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.iter()
.map(|field| new_null_array(field.data_type(), 1))
.collect::<Vec<_>>(),
// we have this single row batch as a placeholder to satisfy evaluation argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this code to use a null array probably pre-dates the ability to create zero column record batches

@alamb
Copy link
Contributor

alamb commented Jan 8, 2024

Also thank you for:

  1. The very clear PR description
  2. The drive by cleanups to comments
  3. The fix of the other bug you found (and the clear comments and tests) 🙏

Very much appreciated

@jonahgao
Copy link
Member Author

jonahgao commented Jan 9, 2024

Thank you for reviewing @alamb .

@alamb alamb merged commit 8050fca into apache:main Jan 9, 2024
22 checks passed
@jonahgao jonahgao deleted the fix_values_exec branch January 10, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot insert with DMLStatement into a table with non-nullable fields
2 participants