-
Notifications
You must be signed in to change notification settings - Fork 670
Closed
Labels
bug 🦗Something isn't workingSomething isn't working
Description
Centos-release-7-5.1804.5.el7.centos.x86_64
Modin 0.8.1.1
Python 3.8.3 (default, Jul 2 2020, 16:21:59) [GCC 7.3.0] :: Anaconda, Inc. on linux
The following takes a few seconds in Pandas, in Modin takes hours.
#import pandas as pd
import modin.pandas as pd
import time
data = [
[1, 1, 1],
[1, 1, 2],
[1, 2, 3],
[1, 2, 4],
[2, 1, 10],
[2, 1, 20],
[2, 2, 30],
[2, 2, 40],
]
df = pd.DataFrame(data, columns = ['card_id', 'day', 'amount'])
df = pd.concat([df for _ in range(1000000)])
start = time.time()
df['count'] = df.groupby(['card_id', 'day'])["amount"].transform('count')
df['sum'] = df.groupby(['card_id', 'day'])["amount"].transform('sum')
df['std'] = df.groupby(['card_id', 'day'])["amount"].transform('std')
df['min'] = df.groupby(['card_id', 'day'])["amount"].transform('min')
df['max'] = df.groupby(['card_id', 'day'])["amount"].transform('max')
end = time.time()
print("{0} seconds".format((end - start)))
Metadata
Metadata
Assignees
Labels
bug 🦗Something isn't workingSomething isn't working