Skip to content

ENH: Add an option to "nunique" to skip columns that contain unhashable values #62940

@wjandrea

Description

@wjandrea

Feature Type

  • Adding new functionality to panda
  • Changing existing functionality in pandas
  • Removing existing functionality in pandas

Problem Description

I wanted to get the nunique for each column in my df, but some columns contained unhashable values like lists, so I got TypeError: unhashable type: 'list'. It would be nice if df.nunique() could skip columns like that, putting NaN for them.

I got around the problem myself like this:

def nunique_if_hashable(s: pd.Series) -> float:
    try:
        return s.nunique()
    except TypeError:
        return np.nan

df.apply(nunique_if_hashable)

With a result like this:

A    0.0
B    1.0
C    3.0
D    NaN
dtype: float64

Since D contains at least one list, and lists aren't hashable, it's skipped.

Setup:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': [np.nan] * 4,
    'B': [1] * 4,
    'C': [5, 5, 6, 7],
    'D': [[], [], [], None]})

Feature Description

I'm imagining a parameter like say skip_unhashable: bool = False that would do the equivalent of the above:

>>> df.nunique(skip_unhashable=True)
A      0
B      1
C      3
D    NaN

Alternative Solutions

The helper function I wrote above isn't so bad. It's not crucial to put this functionality in Pandas, it would just be nice is all.

Additional Context

I loaded this df from JSON.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions