Skip to content

Can't iterate a DataFrameGroupBy object if the group by key contains NaN #14170

Closed
@chengguangnan

Description

@chengguangnan

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4], 'b':[None, None, None, None]})

chunks = df.groupby('a')
assert len([g for g, chunk in chunks]) == 4

chunks = df.groupby(['a', 'b'])

# failed here, it's emtpy
assert len([g for g, chunk in chunks]) == 4
# however, chunks.groups shows that there are 4 groups
#{(1, nan): [0], (2, nan): [1], (3, nan): [2], (4, nan): [3]}


# but if we fillna('') first then it could pass
chunks = df.fillna('').groupby(['a', 'b'])
assert len([g for g, chunk in chunks]) == 4


# In[11]:

Expected Output

I don't see why have NaN in the grouped by key should fail the groupby. Especially if you use chunks.groups, you do have them grouped like this {(1, nan): [0], (2, nan): [1], (3, nan): [2], (4, nan): [3]} In [18]:

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: 
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.2
setuptools: 24.0.0
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 1.0b8
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions