-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
If I provide a schema like so:
{
"namespace": "ns1",
"name": "main",
"type": "record",
"fields": [
{
"name": "f1",
"type": {
"type": "record",
"namespace": "ns2",
"name": "record2",
"fields": [
{
"name": "f1_1",
"type": "string"
}
]
}
}
]
}
The schema parser will use "record2" as the field name, despite it actually being the type name, the field name should be "f1", this means conversion from ArrowSchemas that don't contain the schema json in the metadata will always fail, and in general the schemas will not be applicable to the output recordbatch.
To Reproduce
Create an avro file with the above writer schema, then use the following arrow schema to create a reader_schema
Schema::new(vec![
Field::new(
"f1",
DataType::Struct(
vec![
Field::new("f1_1", DataType::Utf8, false),
]
.into(),
),
false,
)
]).with_metadata(HashMap::from([(AVRO_NAMESPACE_METADATA_KEY.into(), "ns1".into()), (AVRO_NAME_METADATA_KEY.into(), "main".into())]));
(Use AvroSchema::try_from() etc.)
It will error on mismatch in field names because the writer schema will have the field "f1" correctly, but the newly created reader_schema will have "record2"
Expected behavior
The field name should be propagated correctly
Additional context