-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More properly handle nullability of types/literals in Substrait #10640
Conversation
f475c78
to
6a1c0cb
Compare
This isn't perfect; some things are still assumed to just always be nullable (e.g. Literal list elements). But it's still giving a closer match than just assuming everything is nullable.
6a1c0cb
to
76925d0
Compare
@@ -1686,7 +1691,7 @@ fn to_substrait_bounds(window_frame: &WindowFrame) -> Result<(Bound, Bound)> { | |||
)) | |||
} | |||
|
|||
fn to_substrait_literal(value: &ScalarValue) -> Result<Literal> { | |||
fn to_substrait_literal(value: &ScalarValue, nullable: bool) -> Result<Literal> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to ExprSchemable::nullable(), the nullability of a literal depends on its ScalarValue
, so I think the nullable
argument for this function could be omitted.
fn to_substrait_literal(value: &ScalarValue, nullable: bool) -> Result<Literal> {
if value.is_null() {
return Ok(Literal {
nullable: true,
type_variation_reference: DEFAULT_TYPE_REF,
literal_type: Some(LiteralType::Null(to_substrait_type(
&value.data_type(),
true,
)?)),
});
}
At the same time, we can also remove try_to_substrait_null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, thanks! That does indeed simplify things. Fixed in cbd2264
Co-authored-by: Jonah Gao <jonahgao@msn.com>
cbd2264
to
d9a38b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
…he#10640) * More properly handle nullability of types/literals in Substrait This isn't perfect; some things are still assumed to just always be nullable (e.g. Literal list elements). But it's still giving a closer match than just assuming everything is nullable. * Avoid cloning and creating DataFusionError Co-authored-by: Jonah Gao <jonahgao@msn.com> * simplify Literal/ScalarValue null handling --------- Co-authored-by: Jonah Gao <jonahgao@msn.com>
DataFusion (= Arrow) is quite strict about nullability, specifically, when using e.g. LogicalPlan::Values, the given schema must match the given literals exactly - including nullability. This is non-trivial to do when converting schema and literals separately. The existing implementation for from_substrait_literal already creates lists that are always nullable (see ScalarValue::new_list => array_into_list_array). This reverts part of apache#10640 to align from_substrait_type with that behavior. This is the error I was hitting: ``` ArrowError(InvalidArgumentError("column types must match schema types, expected List(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0"), None) ```
…0874) * Ignore nullability of list elements when consuming Substrait DataFusion (= Arrow) is quite strict about nullability, specifically, when using e.g. LogicalPlan::Values, the given schema must match the given literals exactly - including nullability. This is non-trivial to do when converting schema and literals separately. The existing implementation for from_substrait_literal already creates lists that are always nullable (see ScalarValue::new_list => array_into_list_array). This reverts part of #10640 to align from_substrait_type with that behavior. This is the error I was hitting: ``` ArrowError(InvalidArgumentError("column types must match schema types, expected List(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0"), None) ``` * use `Field::new_list_field` in `array_into_(large_)list_array` just for consistency, to reduce the places where "item" is written out * add a test for non-nullable lists
…he#10640) * More properly handle nullability of types/literals in Substrait This isn't perfect; some things are still assumed to just always be nullable (e.g. Literal list elements). But it's still giving a closer match than just assuming everything is nullable. * Avoid cloning and creating DataFusionError Co-authored-by: Jonah Gao <jonahgao@msn.com> * simplify Literal/ScalarValue null handling --------- Co-authored-by: Jonah Gao <jonahgao@msn.com>
…ache#10874) * Ignore nullability of list elements when consuming Substrait DataFusion (= Arrow) is quite strict about nullability, specifically, when using e.g. LogicalPlan::Values, the given schema must match the given literals exactly - including nullability. This is non-trivial to do when converting schema and literals separately. The existing implementation for from_substrait_literal already creates lists that are always nullable (see ScalarValue::new_list => array_into_list_array). This reverts part of apache#10640 to align from_substrait_type with that behavior. This is the error I was hitting: ``` ArrowError(InvalidArgumentError("column types must match schema types, expected List(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0"), None) ``` * use `Field::new_list_field` in `array_into_(large_)list_array` just for consistency, to reduce the places where "item" is written out * add a test for non-nullable lists
Which issue does this PR close?
Extracted from #10531
Builds on top of #10622 so I'll rebase this once it's merged.
Rationale for this change
More closely propagate null-ability through Substrait conversions
What changes are included in this PR?
Handles the nullability of struct fields during Substrait conversions. This isn't perfect; some things are still assumed to just always be nullable (e.g. Literal list elements).
Are these changes tested?
Tested through round-trip unit tests
Are there any user-facing changes?
Nulls are handled more accurately