-
Notifications
You must be signed in to change notification settings - Fork 670
Closed
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Modin version: 0.8.0
- Python version: 3.8
Describe the problem
Result of function DataFrame.duplicated doesn't match between modin and pandas for dataframes which has more than 32 rows.
Source code / logs
import pandas
import modin.pandas as pd
data = [[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2'], [5, 's0'], [6, 's5'],
[5, 's0'], [3, 's1'], [3, 's2']]
pdf = pandas.DataFrame(data).duplicated()
mdf = pd.DataFrame(data).duplicated()
print(f'pandas res\n {pdf}')
print(f'modin res\n {mdf}')Result
pandas res
...
32 True
dtype: bool
modin res
...
32 False
dtype: bool
Looks like function works in each partition separately.
Metadata
Metadata
Assignees
Labels
bug 🦗Something isn't workingSomething isn't working