Skip to content

Parallel NDSON file reading #8502

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

DataFusion can now automatically read CSV and parquet files in parallel (see #6325 for CSV)

It would be great to do the same for "NDJSON" files -- namely files that have multiple JSON objects placed one after the other.

Describe the solution you'd like

Basically implement what is described in #6325 for JSON -- and read a single large ND json file (new line delimited file) in parallel

Describe alternatives you've considered

Some research may be required -- I am not sure if finding record boundaries is feasible

Additional context

I found this while writing tests for #8451

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions