Closed
Description
(edit by @TomAugspurger)
Current output:
In [33]: pd.DataFrame(index=[0, 1]).nunique()
Out[33]:
Empty DataFrame
Columns: []
Index: [0, 1]
Expected Output is an empty series:
Out[34]: Series([], dtype: float64)
Not sure what the expected dtype of that Series should be... probably object.
original post below:
Code Sample, a copy-pastable example if possible
With Pandas 0.20.3
# create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
# lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
# this results in an empty Series, the index is also empty
unique.index.tolist()
>>> []
# and
unique[unique == 1].index.tolist()
>>> []
With pandas 0.23.3
# create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
# lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
# this results in an empty Series, but the index is not empty
unique.index.tolist()
>>> [1,2,3]
also:
unique[unique == 1].index.tolist()
>>> [1,2,3]
Note:
# if we have don't have an empty df, the behavior of nunique() seems fine:
df = pd.DataFrame({'a': ['A','B','C'], 'b': [1,1,1]})
unique = df.loc[:, (df.columns != 'a')].nunique()
unique[unique == 1]
>>> b 1
>>> dtype: int64
# and
unique[unique == 1].index.tolist()
>>> ['b']
Problem description
The change of behavior is a bit disturbing, and seems like it is a bug:
nunique()
ends up creating a Series, and it should be a Series of the df columns, but that doesn't seem to be the case here, instead it is picking up the index of the df.
This is likely related to:
I am posting this because in my use case I use the list to drop the columns, but i end up with column names that do not exist in the df
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 17.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.3
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: None
sphinx: 1.5.5
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None