Skip to content

REGR: GroupBy.indices no longer includes unobserved categories #38642

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Does anybody know if this was an intentional change? (I don't directly find something about it in the whatsnew)

In [9]: pd.__version__ 
Out[9]: '1.0.5'

In [10]: df = pd.DataFrame({"key": pd.Categorical(["b"]*5, categories=["a", "b", "c", "d"]), "col": range(5)}) 

In [11]: gb = df.groupby("key")   

In [12]: list(gb.indices)  
Out[12]: ['a', 'b', 'c', 'd']

vs

In [1]: pd.__version__
Out[1]: '1.3.0.dev0+92.ga2d10ba88a'

In [2]: df = pd.DataFrame({"key": pd.Categorical(["b"]*5, categories=["a", "b", "c", "d"]), "col": range(5)})

In [3]: gb = df.groupby("key")

In [4]: list(gb.indices)
Out[4]: ['b']

This already changed in pandas 1.1, so not a recent change.

The consequence of this is that iterating over gb vs iterating over gb.indices is not consistent anymore.

cc @mroeschke @rhshadrach

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions