We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centos-release-7-5.1804.5.el7.centos.x86_64 Modin 0.8.1.1 Python 3.8.3 (default, Jul 2 2020, 16:21:59) [GCC 7.3.0] :: Anaconda, Inc. on linux
The following takes a few seconds in Pandas, in Modin takes hours.
#import pandas as pd import modin.pandas as pd import time data = [ [1, 1, 1], [1, 1, 2], [1, 2, 3], [1, 2, 4], [2, 1, 10], [2, 1, 20], [2, 2, 30], [2, 2, 40], ] df = pd.DataFrame(data, columns = ['card_id', 'day', 'amount']) df = pd.concat([df for _ in range(1000000)]) start = time.time() df['count'] = df.groupby(['card_id', 'day'])["amount"].transform('count') df['sum'] = df.groupby(['card_id', 'day'])["amount"].transform('sum') df['std'] = df.groupby(['card_id', 'day'])["amount"].transform('std') df['min'] = df.groupby(['card_id', 'day'])["amount"].transform('min') df['max'] = df.groupby(['card_id', 'day'])["amount"].transform('max') end = time.time() print("{0} seconds".format((end - start)))
The text was updated successfully, but these errors were encountered:
Hi @davidjhp , thanks for posting! Please, look at this reply. It's interesting why you are creating df via such logic? It that really necessary?
Sorry, something went wrong.
This seems like a benchmarking attempt, but @davidjhp might be missing some key understanding of how Modin works.
In general, this type of dataframe creation is extremely unusual. Does the same issue occur if you write the data out to a csv file first?
No branches or pull requests
Centos-release-7-5.1804.5.el7.centos.x86_64
Modin 0.8.1.1
Python 3.8.3 (default, Jul 2 2020, 16:21:59) [GCC 7.3.0] :: Anaconda, Inc. on linux
The following takes a few seconds in Pandas, in Modin takes hours.
The text was updated successfully, but these errors were encountered: