Skip to content

Avro schema parser uses type name instead of field name [AVRO] #8928

@EmilyMatt

Description

@EmilyMatt

Describe the bug

If I provide a schema like so:

{
  "namespace": "ns1",
  "name": "main",
  "type": "record",
  "fields": [
    {
      "name": "f1",
      "type": {
        "type": "record",
        "namespace": "ns2",
        "name": "record2",
        "fields": [
          {
            "name": "f1_1",
            "type": "string"
          }
        ]
      }
    }
  ]
}

The schema parser will use "record2" as the field name, despite it actually being the type name, the field name should be "f1", this means conversion from ArrowSchemas that don't contain the schema json in the metadata will always fail, and in general the schemas will not be applicable to the output recordbatch.

To Reproduce

Create an avro file with the above writer schema, then use the following arrow schema to create a reader_schema

Schema::new(vec![
            Field::new(
                "f1",
                DataType::Struct(
                    vec![
                        Field::new("f1_1", DataType::Utf8, false),
                    ]
                    .into(),
                ),
                false,
            )
]).with_metadata(HashMap::from([(AVRO_NAMESPACE_METADATA_KEY.into(), "ns1".into()), (AVRO_NAME_METADATA_KEY.into(), "main".into())]));

(Use AvroSchema::try_from() etc.)

It will error on mismatch in field names because the writer schema will have the field "f1" correctly, but the newly created reader_schema will have "record2"

Expected behavior

The field name should be propagated correctly

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions