Skip to content

Filtering by exclusion of duplicate rows does not preserve column list for an empty dataframeย #25184

Closed
@anatoly-scherbakov

Description

@anatoly-scherbakov

Code Sample, a copy-pastable example if possible

import pandas as pd

x_df = pd.DataFrame(columns=['a', 'b'])
series = x_df.duplicated(subset=['a'])

list(x_df[~series])

# Expected output on Pandas 0.23.4: ['a', 'b']
# But, Pandas 0.24.1 returns: []

Problem description

We have been using this approach to remove duplicate rows on a dataframe, where rows are compared by one column only. Everything worked perfectly until we found out that, if the original dataframe is empty, in the result dataframe column list is lost after Pandas upgrade to latest version.

Expected Output

We would expect the column list to be preserved.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-44-generic
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: 0.22
numpy: 1.12.1
scipy: None
pyarrow: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.4
patsy: None
dateutil: 2.5.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 3.7.3
bs4: 4.5.3
html5lib: 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions