Skip to content

Have read_json serialize directly to Arrow arrays #55852

Open
@WillAyd

Description

@WillAyd

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Right now if you use dtype_backend=“pyarrow” with read_json you still first serialize to numpy arrays

Feature Description

I propose we vendor nanoarrow and use that directly to create arrow arrays in JSONToObj.c

I think this is also nice because pyarrow is currently limited to reading line delimited JSON. We have a unique advantage with the formats we already cover if we can just plug in nanoarrow to build the arrays

Alternative Solutions

N/a

Additional Context

Before tackling this #55102 would be very helpful

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityIO JSONread_json, to_json, json_normalize

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions