Skip to content

Sparse dataframe fails when astype() is called with dictionary #26227

Closed
@stefansimik

Description

@stefansimik

Code Sample, a copy-pastable example if possible

# Create DataFrame
df = pd.DataFrame({'A': [1.0, 2.0, np.nan, 3.0], 'B': [100, 200, 300, 400]})

# Conversion mapping
convert_cols_dict = {'A': np.float32, 'B': np.int32}

# WORKS FINE:  Conversion on dense-dataframe works well
df_optimized = df.astype(convert_cols_dict)

# FAILS : Conversion on sparse_dataframe fails on KeyError exception
df_sparse = df.to_sparse()
df_sparse_optimized = df_sparse.astype(convert_cols_dict)

Problem description

Problem is, that one cannot convert column types when working with sparse-dataframe.
But calling .astype(..) function fails with this error:

---------------------------------------
KeyErrorTraceback (most recent call last)
<ipython-input-62-fcedc5114473> in <module>
      1 # Conversion on sparse_dataframe fails
      2 df_sparse = df.to_sparse()
----> 3 df_sparse_optimized = df_sparse.astype(convert_cols_dict)
      4 df_sparse_optimized

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in astype(self, dtype)
    349 
    350     def astype(self, dtype):
--> 351         return self._apply_columns(lambda x: x.astype(dtype))
    352 
    353     def copy(self, deep=True):

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in _apply_columns(self, func)
    342 
    343         new_data = {col: func(series)
--> 344                     for col, series in compat.iteritems(self)}
    345 
    346         return self._constructor(

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <dictcomp>(.0)
    342 
    343         new_data = {col: func(series)
--> 344                     for col, series in compat.iteritems(self)}
    345 
    346         return self._constructor(

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <lambda>(x)
    349 
    350     def astype(self, dtype):
--> 351         return self._apply_columns(lambda x: x.astype(dtype))
    352 
    353     def copy(self, deep=True):

D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5659             if self.ndim == 1:  # i.e. Series
   5660                 if len(dtype) > 1 or self.name not in dtype:
-> 5661                     raise KeyError('Only the Series name can be used for '
   5662                                    'the key in Series dtype mappings.')
   5663                 new_type = dtype[self.name]

KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

Expected Output

It is expected, that what works on dense-dataframe, should also work on sparse-dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 19.0.1
setuptools: 40.8.0
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions