Skip to content

Allow JSON deserialization of StructArray from JSON List #6558

@jagill

Description

@jagill

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, to deserialize a StructArray from JSON, you need to use a JSON Object. E.g., deserializing a Struct<a: i32, b: string> would need something like {a: 1, b: "c"}. This is also true of top-level RecordBatches. Some services, such as Presto and Trino, serialize ROW fields as lists. The example above would be serialized as [1, "c"]. If you already know the schema, this is a more compact representation that reduces the data on the wire.

I would like the ability for arrow_json to deserialize these list-encoded structs and record batches, perhaps under an option flag.

Describe the solution you'd like

When a StructArrayDecoder encounters a [ (around here), it switches to a parsing mode that does not look for field names, and requires a closing ] for completion. It should return a parsing error if either the number of entries of the list is not the same as the number of fields in the struct, or if any of the sub-parsers encounter the wrong type. This requires the fields of the struct to be in the same order as the JSON List, while the current object parsing can shuffle the fields if they appear in a different order.

Describe alternatives you've considered

I currently parse the results with serde_json, then recursively run down the JSON to convert Lists to Objects using the schema. Then I re-serialize the top-level JSON, then read it using arrow_json. This is not very efficient.

Alternatively, I could reproduce a less-good copy of the version of arrow_json that deserialized using serde_json. Either making serde_json decoders directly, or taking a serde_json::Value and populate the ArrayBuilders myself. This is a lot of code duplication.

Additional context

I don't personally need the ability to serialize a StructArray/RecordBatch into a List, although that would seem symmetrical.

I am happy to make an RFC PR implementing this functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions