Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun pandas 2E9 benchmark from dev #823

Closed
mattdowle opened this issue Sep 21, 2014 · 4 comments
Closed

Rerun pandas 2E9 benchmark from dev #823

mattdowle opened this issue Sep 21, 2014 · 4 comments
Assignees

Comments

@mattdowle
Copy link
Member

Jeff has pushed a fix to master to fix the memory allocation problem :
pandas-dev/pandas#8252

Over email he confirmed I need to change randChar and randFloat as well, but not the .reshape(3,-1).T,copy=False or two np.concatenate calls i.e. the original DataFrame() call should work with the new dev version plus the changes to randChar and randFloat.

@arunsrinivasan
Copy link
Member

Related to #2.

@jangorecki jangorecki self-assigned this Jun 1, 2018
@mattdowle
Copy link
Member Author

More than covered by https://h2oai.github.io/db-benchmark/ now running regularly comparing latest dev versions, including 2E9, thanks to @jangorecki.

@jangorecki
Copy link
Member

jangorecki commented Jan 22, 2019

The issue asks precisely for 2E9 which was failing due to bug in pandas. In db-benchmark we don't test 2E9 size, maximum is 1E9, which for pandas is not available anyway due to memory constraints. So I would say this issue is still unresolved, as we want to test the bug fix from 2014.
After dplyr and pandas upgrades (which should happen very soon) I will make one time run of benchmark on aws 255GB machine.
The question is, should I run

  • A exactly the code from 2014 - 3 solutions, less cases, much faster, apples to apples comparison
  • B current db-benchmark - 7 solutions, more cases, slower, apples to old-apples comparison

@mattdowle
Copy link
Member Author

mattdowle commented Jan 22, 2019

Oops, I remember now. Yes agree. I'd say option A then.
Moving this issue to h2oai/db-benchmark#71 then. Since the results should be placed on db-bench. Perhaps not on the front page because they wouldn't be run regularly but linked there somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants