Description
Code Sample, a copy-pastable example if possible
# Create DataFrame
df = pd.DataFrame({'A': [1.0, 2.0, np.nan, 3.0], 'B': [100, 200, 300, 400]})
# Conversion mapping
convert_cols_dict = {'A': np.float32, 'B': np.int32}
# WORKS FINE: Conversion on dense-dataframe works well
df_optimized = df.astype(convert_cols_dict)
# FAILS : Conversion on sparse_dataframe fails on KeyError exception
df_sparse = df.to_sparse()
df_sparse_optimized = df_sparse.astype(convert_cols_dict)
Problem description
Problem is, that one cannot convert column types when working with sparse-dataframe.
But calling .astype(..) function fails with this error:
---------------------------------------
KeyErrorTraceback (most recent call last)
<ipython-input-62-fcedc5114473> in <module>
1 # Conversion on sparse_dataframe fails
2 df_sparse = df.to_sparse()
----> 3 df_sparse_optimized = df_sparse.astype(convert_cols_dict)
4 df_sparse_optimized
D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in astype(self, dtype)
349
350 def astype(self, dtype):
--> 351 return self._apply_columns(lambda x: x.astype(dtype))
352
353 def copy(self, deep=True):
D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in _apply_columns(self, func)
342
343 new_data = {col: func(series)
--> 344 for col, series in compat.iteritems(self)}
345
346 return self._constructor(
D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <dictcomp>(.0)
342
343 new_data = {col: func(series)
--> 344 for col, series in compat.iteritems(self)}
345
346 return self._constructor(
D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\sparse\frame.py in <lambda>(x)
349
350 def astype(self, dtype):
--> 351 return self._apply_columns(lambda x: x.astype(dtype))
352
353 def copy(self, deep=True):
D:\data_science\bin\anaconda3\envs\nbo2\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
5659 if self.ndim == 1: # i.e. Series
5660 if len(dtype) > 1 or self.name not in dtype:
-> 5661 raise KeyError('Only the Series name can be used for '
5662 'the key in Series dtype mappings.')
5663 new_type = dtype[self.name]
KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
Expected Output
It is expected, that what works on dense-dataframe, should also work on sparse-dataframe.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: 19.0.1
setuptools: 40.8.0
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None