Skip to content

[FEA] Ability to round-trip all pandas columns dtypes #14149

Open

Description

Is your feature request related to a problem? Please describe.
With the current Column design and to_pandas API implementation it is only possible to convert a cudf series to numpy dtype or pandas nullable dtypes. However, pandas also support arrow-backed dtypes.

In [1]: import pandas as pd

In [2]: np_series = pd.Series([1, 2, 3], dtype='int64')

In [3]: pd_series = pd.Series([1, 2, 3], dtype=pd.Int64Dtype())

In [4]: import pyarrow as pa

In [5]: arrow_series = pd.Series([1, 2, 3], dtype=pd.ArrowDtype(pa.int64()))

In [6]: np_series
Out[6]: 
0    1
1    2
2    3
dtype: int64

In [7]: pd_series
Out[7]: 
0    1
1    2
2    3
dtype: Int64

In [8]: arrow_series
Out[8]: 
0   1
1   2
2   3
dtype: int64[pyarrow]

In [9]: import cudf

In [10]: cudf.from_pandas(np_series).to_pandas()
Out[10]: 
0    1
1    2
2    3
dtype: int64

In [11]: cudf.from_pandas(pd_series).to_pandas()
Out[11]: 
0    1
1    2
2    3
dtype: int64

In [12]: cudf.from_pandas(arrow_series).to_pandas()
Out[12]: 
0    1
1    2
2    3
dtype: int64

Describe the solution you'd like
I would like cudf to have the ability to round-trip the data type of pandas successfully.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

PythonAffects Python cuDF API.cudf.pandasIssues specific to cudf.pandasfeature requestNew feature or request

Type

No type

Projects

  • Status

    In Progress
  • Status

    Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions