Skip to content

Conversation

@vegarsti
Copy link
Contributor

Which issue does this PR close?

Closes #8840.

Rationale for this change

From the issue:

When merging two schemas using Schema::try_merge where one side doesn't have a column but the other does, then it keep the nullability setting of the preexisting column. However, this semantically doesn't make sense, the merged version of that would be the field with nullable being true, since that is the implicit property of the schema that doesn't have the field.

Consequently what this means is that it's impossible to merge schemas and therefore record batches where one side has a field that is nullable false, and the other doesn't have it at all.

What changes are included in this PR?

Adds a preserve_nullability argument to SchemaBuilder::try_merge which, when set to true will preserve the current behavior of using the field's nullability. When set to false, the nullability of the field will always be set to true.

Are these changes tested?

Yes, existing tests (some modified), and new test.

Are there any user-facing changes?

Yes. When merging schemas, any non-nullable fields not present in all schemas will now always be nullable.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 15, 2025
/// );
/// ```
pub fn try_merge(schemas: impl IntoIterator<Item = Self>) -> Result<Self, ArrowError> {
let schemas = schemas.into_iter().collect::<Vec<Self>>();
Copy link
Contributor Author

@vegarsti vegarsti Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increases the memory consumption of this method. An alternative could be to change the function signature to schemas: &[Self]? But I haven't checked if that is feasible.

@vegarsti
Copy link
Contributor Author

I'll address the test failures, of course!

@alamb alamb changed the title Handle field nullability on schema merges Handle field nullability on schema merges with a missing column Nov 15, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brancz and @vegarsti

The topic of schema merging and comparison comes up a lot in DataFuson - i wrote some notes here: #8840 (comment)

pub fn try_merge(
&mut self,
field: &FieldRef,
preserve_nullability: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a breaking API change, which means we would have to wait for the next major release to get it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected schema merging nullability behavior

2 participants