Open
Description
Is your feature request related to a problem? Please describe.
With the current Column
design and to_pandas
API implementation it is only possible to convert a cudf series to numpy dtype or pandas nullable dtypes. However, pandas also support arrow-backed dtypes.
In [1]: import pandas as pd
In [2]: np_series = pd.Series([1, 2, 3], dtype='int64')
In [3]: pd_series = pd.Series([1, 2, 3], dtype=pd.Int64Dtype())
In [4]: import pyarrow as pa
In [5]: arrow_series = pd.Series([1, 2, 3], dtype=pd.ArrowDtype(pa.int64()))
In [6]: np_series
Out[6]:
0 1
1 2
2 3
dtype: int64
In [7]: pd_series
Out[7]:
0 1
1 2
2 3
dtype: Int64
In [8]: arrow_series
Out[8]:
0 1
1 2
2 3
dtype: int64[pyarrow]
In [9]: import cudf
In [10]: cudf.from_pandas(np_series).to_pandas()
Out[10]:
0 1
1 2
2 3
dtype: int64
In [11]: cudf.from_pandas(pd_series).to_pandas()
Out[11]:
0 1
1 2
2 3
dtype: int64
In [12]: cudf.from_pandas(arrow_series).to_pandas()
Out[12]:
0 1
1 2
2 3
dtype: int64
Describe the solution you'd like
I would like cudf to have the ability to round-trip the data type of pandas successfully.
Metadata
Assignees
Type
Projects
Status
In Progress
Status
Todo
Activity