Skip to content

JSON table orient not roundtripping extension types #32037

Open
@navado

Description

@navado

Code Sample, a copy-pastable example if possible

import pandas
print(pandas._version.get_versions())
df = pandas.DataFrame(
    {
        1: ['v1','v2'],
        2: ['v3','v4']
    },dtype='string'
)

print(df.info())
for orient in ['table','split','records','values']:
    rdf = pandas.read_json(df.to_json(orient=orient), orient=orient)
    print(f'======{orient}======')
    print(rdf.info())

Problem description

string dtype not preserved with round trip serialization to JSON, so dataframes containing strings cannot be reused transparently

Expected Output

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1       2 non-null      string
 1   2       2 non-null      string
dtypes: string(2)
memory usage: 160.0 bytes
None

Actual output

======table======
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1       0 non-null      object
 1   2       0 non-null      object
dtypes: object(2)
memory usage: 48.0+ bytes
None
======split======
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1       2 non-null      object
 1   2       2 non-null      object
dtypes: object(2)
memory usage: 48.0+ bytes
None
======records======
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1       2 non-null      object
 1   2       2 non-null      object
dtypes: object(2)
memory usage: 160.0+ bytes
None
======values======
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       2 non-null      object
 1   1       2 non-null      object
dtypes: object(2)
memory usage: 160.0+ bytes
None

Output of pd.show_versions()

{'dirty': False, 'error': None, 'full-revisionid': '29d6b0232aab9576afa896ff5bab0b994760495a', 'version': '1.0.1'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.IO JSONread_json, to_json, json_normalize

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions