Behavior of pandas.DataFrame.duplicated()

While trying to get handle on duplicated records I stumble upon [this](http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe) which lead to conclusion that `.duplicated(take_last=True)` seem to be taking the first of duplicate rows and `.duplicate(take_last=False)` takes the last rows.
Here is an illustration:

```
import pandas as pd
data = { 'key1':[1,2,3,1,2,3,2,2,2],
      'key2':[2,2,1,2,2,2,2,2,2],
      'dup':['d1_1','d2_1', 'n_d','d1_2','d2_2', 'n_d','d2_3','d2_4','d2_5']}
df = pd.DataFrame(data,columns=['key1','key2','dup'])
print df
   key1  key2  dup
0     1     2  d1_1
1     2     2  d2_1
2     3     1   n_d
3     1     2  d1_2
4     2     2  d2_2
5     3     2   n_d
6     2     2  d2_3
7     2     2  d2_4
8     2     2  d2_5
```

Now with `take_last=False` it would be fair to expect dn_1s to be in output, but this is not the case:

```
c1 = df.duplicated(['key1', 'key2'], take_last=False)
df[c1]
   key1  key2   dup
3     1     2  d1_2
4     2     2  d2_2
6     2     2  d2_3
7     2     2  d2_4
8     2     2  d2_5
```

And `take_last=True` outputs the first rows:

```
c2 = df.duplicated(['key1', 'key2'], take_last=True)
df[c2]
   key1  key2   dup
0     1     2  d1_1
1     2     2  d2_1
4     2     2  d2_2
6     2     2  d2_3
7     2     2  d2_4
```

Unless I am misunderstanding the doc:

```
take_last : boolean, default False
    Take the last observed row in a row. Defaults to the first row
```

it does feel that `.duplicated()` could be improved by fixing this behavior.
And additionally it would have one more parameter:

```
take_all : boolean, default False
    Take all observed rows. Overrides take_last
```

or alternatively a keyword parameter:

```
take : 'last', 'first', 'all', default 'last'
    Sets which observed duplicated rows to take. Default: take last observed rows.
```

Right now trying to get all observed duplicates requires applying two above conditions `c1 | c2`.
This was done with pandas 0.14.1.
Thank you.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Behavior of pandas.DataFrame.duplicated() #8505

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Behavior of pandas.DataFrame.duplicated() #8505

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions