Skip to content

pandas.DataFrame.duplicated to allow take_all #6511

Closed
@socheon

Description

@socheon

When working with external data, I often see rows with primary key violations. Currently, I could not easily select all the violating rows. For example, if I have a massive file with some inconsistent data

datecol,valuecol
...
2014-01-01,12
2014-01-01,13
2014-01-02,10
...

In this use case, it would be good if we can do df[df.duplicated('datecol', take_all=True)] to directly get the bad rows

2014-01-01,12
2014-01-01,13

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffIndexingRelated to indexing on series/frames, not to indexes themselvesNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions