-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dataframe.groupby instead of iterating #61
Conversation
Reproduces issue: cheind#49
Select non-empty elements before making hypothesis ID an index. This avoids "non-empty take from an empty axes" error.
DataFrame.loc[] may return DataFrame or Series.
Add corresponding test.
Speeds up construction of linear assignment problem.
Add benchmark using pytest-benchmark.
Sorry about the large number of commits. When the pull request was merged, the commits were flattened. I only pulled this collapsed-commit after doing the work. If you like, I can try to rebase it? |
Opening new pull request with rebased branch |
Very cool, Jack! You are driving this project :) I guess we should think about making you a maintainer, since the time I can spend on professionally on this project has become quite limited. Interested? |
Thanks! Yep, I think I would be able to do that. Let's talk via email. |
I noticed that iteratively selecting rows from the dataframe was a serious bottleneck.
It looks like someone was already investigating this. I have removed the use of the cached analysis and the lines which computed timings.
I isolated the code for extracting counts and added a benchmark (and a dependency on
pytest-benchmark
).Before:
After (time in ms not s):