Closed
Description
opened on Dec 17, 2020
Is your feature request related to a problem? Please describe.
I'd like to run get_dummies
on a Series. When trying to perform .get_dummies
on a Series, cudf throws an Attribute Error: 'Series' object has no attribute 'select_dtypes'
.
Describe the solution you'd like
import pandas as pd
import cudf
import numpy
df = cudf.DataFrame()
df['a']= numpy.random.randint(0,5,20)
df['b']= numpy.random.randint(0,5,20)
df['c']= numpy.random.randint(0,5,20)
pdf = df.to_pandas()
print(pdf.head())
print(df.head())
pdf2 = pd.get_dummies(pdf['a'], prefix=['a_']) #this works
print(pdf2.head())
df2 = cudf.get_dummies(df['a'], prefix=['a_']) #this does NOT work currently
print(df2.head())
I'd like print(df2.head())
's output to be something like:
['a_']_0 ['a_']_1 ['a_']_2 ['a_']_3 ['a_']_4
0 0 1 0 0 0
1 0 0 0 1 0
2 0 0 0 1 0
3 1 0 0 0 0
4 0 0 0 1 0
instead of
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-d45df2a33ea1> in <module>
14 pdf2 = pd.get_dummies(pdf['a'], prefix=['a_']) #this works
15 print(pdf2.head())
---> 16 df2 = cudf.get_dummies(df['a'], prefix=['a_']) #this does NOT work
17 print(df2.head())
/opt/conda-environments/rapids-stable/lib/python3.8/site-packages/cudf/core/reshape.py in get_dummies(df, prefix, prefix_sep, dummy_na, columns, cats, sparse, drop_first, dtype)
564
565 if columns is None or len(columns) == 0:
--> 566 columns = df.select_dtypes(include=encode_fallback_dtypes).columns
567
568 def length_check(obj, name):
AttributeError: 'Series' object has no attribute 'select_dtypes'
Describe alternatives you've considered
Only current alternative is to convert to pandas
Activity