Skip to content

DEPR: DataFrame dropna accepting multiple axes #20987

Closed
@kunalgosar

Description

@kunalgosar

Code Sample, a copy-pastable example if possible

This is the relevant code extract from pandas source:

if isinstance(axis, (tuple, list)):
       result = self
       for ax in axis:
           result = result.dropna(how=how, thresh=thresh, subset=subset, axis=ax)

This is the output from the function call:

In [7]: df
Out[7]: 
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

In [8]: df.dropna(axis=[0, 1], subset=['A', 'C'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-4c4a476386f3> in <module>()
----> 1 df.dropna(axis=[0, 1], subset=['A', 'C'])

~/dev/pandas/pandas/core/frame.py in dropna(self, axis, how, thresh, subset, inplace)
   4265             for ax in axis:
   4266                 result = result.dropna(how=how, thresh=thresh, subset=subset,
-> 4267                                        axis=ax)
   4268         else:
   4269             axis = self._get_axis_number(axis)

~/dev/pandas/pandas/core/frame.py in dropna(self, axis, how, thresh, subset, inplace)
   4276                 check = indices == -1
   4277                 if check.any():
-> 4278                     raise KeyError(list(np.compress(check, subset)))
   4279                 agg_obj = self.take(indices, axis=agg_axis)
   4280 

KeyError: ['A', 'C']

Problem description

Subset selects columns/rows from the "other" axis. Passing in the same subset of labels to both axis calls does not really make sense. Unless the subset labels are present on both axes, the function will always throw a KeyError.

I would expect that subset can take in a list of subsets for each axis, when axis is_list_like. If this is agreeable, I'm happy to submit a PR with this change.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: bd4332f
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0rc2+13.gbd4332f4b
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandas

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions