-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate candidate string versions only once in get_applicable_candidates
#12664
Calculate candidate string versions only once in get_applicable_candidates
#12664
Conversation
eb7c1eb
to
3ddec7f
Compare
Found one further minor improvement, Other than that this logic is quite sensitive due to the complicated way prereleases work in the version spec, so I wasn't able to find any other big improvements, except when Marking ready for review. |
Updated now Python 3.13 is passing CI |
e0bcee2
to
d845ad9
Compare
# types. This way we'll use a str as a common data interchange | ||
# format. If we stop using the pkg_resources provided specifier | ||
# and start using our own, we can drop the cast to str(). | ||
candidates_and_versions = [(c, str(c.version)) for c in candidates] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how much memory overhead this may induce in a large install? I agree this block can likely be further optimised since it is basically filtering on one list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how much memory overhead this may induce in a large install?
I ran memray and there was no noticable memory overhead, peak memory for a dry run install of apache-airflow[all]==2.9.2 on Python 3.12 was 354 MBs, memory usage was dominated by making a list of pages of all candidates (I'm going to make a seperate issue on that).
I agree this block can likely be further optimised since it is basically filtering on one list.
I tried making it simpler, but found that the behavior of pre-releases made it problematic. You can't filter against an individual version, because 1 pre-release will allow that pre-release, but one final version and a pre-release will not allow that pre-release unless allow_prereleases=True
.
d845ad9
to
811ab0b
Compare
I'm putting this on the 24.2 milestone, let me know if that's an overreach and I will remove it. |
Thanks @notatallshaw! |
This is a minor performance, I measure it at 1% fairly consistently across different resolutions I've tried.
I was looking at the "After call graph" in #12663 and noticed that
get_applicable_candidates
was taking 16% of run time, and even though it was only called 921 times it is calling other methods hundreds of thousands of times.The only obvious thing I spotted though is it effectively calculating
[c.version for c in candidates]
twice, and can be seen on this part of the call graph:Highlighted call graph
I suspect though that this function has further significant optimization, so I will leave it as draft for now and think on it and take any suggestions, before marking it ready for review.