-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
When merging two schemas using Schema::try_merge where one side doesn't have a column but the other does, then it keep the nullability setting of the preexisting column. However, this semantically doesn't make sense, the merged version of that would be the field with nullable being true, since that is the implicit property of the schema that doesn't have the field.
Consequently what this means is that it's impossible to merge schemas and therefore record batches where one side has a field that is nullable false, and the other doesn't have it at all.
To Reproduce
#[test]
fn test_schema_merge_nullability() {
let merged = Schema::try_merge(vec![
Schema::new(vec![
Field::new("first_name", DataType::Utf8, false),
]),
Schema::new(vec![
Field::new("last_name", DataType::Utf8, false),
]),
])
.unwrap();
assert_eq!(
merged,
Schema::new(
vec![
Field::new("first_name", DataType::Utf8, true),
Field::new("last_name", DataType::Utf8, true),
],
)
);
}
Expected behavior
The above test passes.
Additional context
This of course assumes that the whole intention of schema merging is to merge record batches of merged schemas, so if that is a wrong assumption this can be solved other ways for us by creating our own schema merge functionality, however, I see no useful reason to merge schemas and not also merge record batches. And if that's the intention then I think this is a legitimate bug.