-
Notifications
You must be signed in to change notification settings - Fork 85
perf: faster reading of files with many branches #1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ci is failing due to numpy 2.3 and awkward. Perhaps we need a quick awkward patch release. |
I just found out that another big improvement could be made by changing the way uproot5/src/uproot/behaviors/TBranch.py Lines 1960 to 1984 in 61835f1
|
After the last commit it's actually back to the speed of Uproot3! |
Here are the results from the line profiler for
As you can see, most of the time is spent on computing keys, but the real issue is that In the meeting we found that this was done in #116, so I'll look into that. |
I implemented some simpler approach to compute indices instead of having some post creation hook. Let me know what you think. Also, I reverted to using the getter when the branch name is not alphanumeric. So I think this PR is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ariostas - looks great! Thanks for fixing it!
This is ready for review again, but there's a bunch of tests failing for an unrelated reason. |
5ff1382
to
6d55f75
Compare
@ariostas - Great! Thanks! Please, merge it if you finished. Thanks! |
This PR fixes #1442 (at least partially). It's not clear to me why this getter is needed, but it has a huge performance cost, so I switched to using the branch names directly.
This change gives it a huge boost in performance for trees with many branches, although it is still significantly slower than Uproot3.
For comparison, I generated a file with 1000 branches, as done in #1442 and the output of
went from
to
@ianna do you know why this getter could have been needed in the first place? All the tests seem to pass, so I'm not sure why it was needed.