Skip to content

Support ndjson -- newline delimited json -- for streaming data. #9180

Closed
@okdistribute

Description

@okdistribute

Hey all,

I'm a developer on dat project (git for data) and we are building a python library to interact with the data store.

Everything in dat is streaming, and we use newline delimited json as the official transfer format between processes.

Take a look at the specification for newline delimited json here

Does pandas support this yet, and if not, would you consider adding a to_ndjson function to the existing output formats?

For example, the following table:

> df
    a        b          key                     version
0  True   False  ci4diwru70000x6xmmn19nba1        1
1  False  True   ci4diww6j0001x6xmbyp5o2f0        1

Would be converted to

> df.to_ndjson()
'{ "a": True, "b": False, "key": ci4diwru70000x6xmmn19nba1, "version": 1}\n{ "a": False, "b": True, "key": ci4diww6j0001x6xmbyp5o2f0, "version": 1}'

For general streaming use cases, it might be nice to also consider other ways of supporting this format, like a generator function that outputs ndjson-able objects

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions