-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
This is basically a followup of apache/arrow#27083, which was not migrated to this repository.
I think the defaults for the C++ implementation may have changed since that issue was posted (and I have no idea where the Java implementation has this logic).
- https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.cc#L1419 which calls https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/vector.h#L48. I think that the ability to decide how to deal with conflicts was lost over time...
Here is the existing docs: https://docs.rs/arrow/latest/arrow/array/struct.StructArray.html#method.column_by_name. That documentation comment should probably be updated...
In addition to this, I think there are several issues related to StructArray casting and schema evolution that have not taken this behavior into account. Struct casting might be fine because it only looks at the type of the field and not the name? But I can imagine that schema evolution becomes stranger when you can have a bunch of fields with the same name that have different types.
- Support StructArray in Cast Kernel #4908
- Support for "Schema evolution" / Schema Adapters #6735
- Add a way to map RecordBatch schema from one to another #5996
Also see the related discussion in Vortex, as we have a similar behavior now simply because this is what the Rust Arrow implementation does.