Skip to content

[FEA] Run get_dummies on cudf Series #7031

Closed
@taureandyernv

Description

Is your feature request related to a problem? Please describe.
I'd like to run get_dummies on a Series. When trying to perform .get_dummies on a Series, cudf throws an Attribute Error: 'Series' object has no attribute 'select_dtypes'.

Describe the solution you'd like

import pandas as pd
import cudf
import numpy

df = cudf.DataFrame()
df['a']= numpy.random.randint(0,5,20)
df['b']= numpy.random.randint(0,5,20)
df['c']= numpy.random.randint(0,5,20)
pdf = df.to_pandas()

print(pdf.head())
print(df.head())

pdf2 = pd.get_dummies(pdf['a'], prefix=['a_']) #this works
print(pdf2.head())
df2 = cudf.get_dummies(df['a'], prefix=['a_']) #this does NOT work currently
print(df2.head())

I'd like print(df2.head())'s output to be something like:

   ['a_']_0  ['a_']_1  ['a_']_2  ['a_']_3  ['a_']_4
0         0         1         0         0         0
1         0         0         0         1         0
2         0         0         0         1         0
3         1         0         0         0         0
4         0         0         0         1         0

instead of

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-d45df2a33ea1> in <module>
     14 pdf2 = pd.get_dummies(pdf['a'], prefix=['a_']) #this works
     15 print(pdf2.head())
---> 16 df2 = cudf.get_dummies(df['a'], prefix=['a_']) #this does NOT work
     17 print(df2.head())

/opt/conda-environments/rapids-stable/lib/python3.8/site-packages/cudf/core/reshape.py in get_dummies(df, prefix, prefix_sep, dummy_na, columns, cats, sparse, drop_first, dtype)
    564 
    565     if columns is None or len(columns) == 0:
--> 566         columns = df.select_dtypes(include=encode_fallback_dtypes).columns
    567 
    568     def length_check(obj, name):

AttributeError: 'Series' object has no attribute 'select_dtypes'

Describe alternatives you've considered
Only current alternative is to convert to pandas

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

PythonAffects Python cuDF API.feature requestNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions