-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache wheels that pip wheel built locally #6853
Conversation
Too sleepy to think through the implications of this right now, but this definitely needs tests. |
What would probably help is some sort of definition of what
Of course, I'll add one (unless some reason for not changing this pops up). |
Here is the commit that introduced "autobuilding": 08acb66 Also, note in a comment to the commit that the docstring's |
433d92c
to
6198823
Compare
Thanks for the pointer. So as I understand it, It should then be safe to cache in both cases I added a test. I used log parsing as test assertion, as it's the closest I could find as being useful to test this. |
Thanks for filing the PR. I don't think the implementation is as simple as what you provided though because pip/src/pip/_internal/wheel.py Lines 774 to 825 in 908e203
So we need to be very careful. Also, in the original commit introducing the |
Yes, we need to be careful indeed. Digging a bit more into this, it is correct that |
Yes, I think so. At least that's how it was when I checked the original commit that introduced
|
My investigation confirm this. So my questioning is moving from understanding what Also I note And The more I read this the more I think it would be good to untangle the caching logic from the destination of the built wheel. |
Yes, that is a key question. Won't you more or less want to respect what You may want to look at the original commit for further background / insight. For example, it has this text: 08acb66#diff-2695f32c4432acd141c3dbe7e7e3a6b0R54-R59 As well as this: 08acb66#diff-2695f32c4432acd141c3dbe7e7e3a6b0R679-R680
Certainly, perhaps. Though I do think it would be best IMO to keep individual PR's limited in scope so that each PR stays manageable. (It's easy for things to grow out of hand as one starts to improve surrounding code.) Incidentally, in the wheel use case, will it help your use case to use the ephemeral cache, or is only the regular cache useful? |
In the case where |
Not really, as I want (rather I'd find it more coherent) for
Maybe I can start with that to clarify the reasoning a bit. |
Yes, I was agreeing with that. What I meant was, in cases where pip install uses the ephemeral cache, is it ever useful in the pip wheel case (e.g. in any of your use cases) when pip wheel uses the ephemeral cache? Or are only the cases where pip wheel uses the regular cache of practical interest to you? |
So far, yes. I found a problem with this PR as it stands now. Since It works correctly in master because the caller function (build) knows that it will never use the cache in the case of pip wheel (i.e. So if we agree that pip wheel and pip install should behave the same way wrt caching, I'm wondering if we should not change |
Also, I would like to add more tests for caching (for this and #6640). The only example I found to find the cache directory is: pip/tests/functional/test_install.py Lines 1179 to 1185 in 76a8954
Is there a better approach? |
Hm, maybe not. I'm mislead again by the method name. It not only decides which cache to use, but also to skip building, and that part is different for pip wheel and pip install. |
Yes, this is what I was getting at when I said this above:
Regarding this--
Yes, I was confused for a moment by this, too, which is why, after reflecting again, I said this:
The reason is that, for the wheel case, you want to be able to tell
There are also some unit tests of |
Here's a thought in addition to my comment here re: adding a |
I pushed the fix and further clarification of I still need to expand the test, but if you want to have look at it already, your early feedback is welcome. @cjerdonek I need to think more about the after_build suggestion. Do you think this must go in this PR or could it be left for another refactoring PR? |
I have the feeling that the if logic can be made a lot simpler. It still seems unnecessarily complicated to me. Since all of the |
Yes, I was thinking about that too. |
src/pip/_internal/wheel.py
Outdated
return None | ||
|
||
if "binary" not in format_control.get_allowed_formats( | ||
canonicalize_name(req.name)): | ||
if must_build: | ||
# --no-binary has no effect as a pip wheel option | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning True
here, why wouldn't you instead want to skip this clause and continue on (so make the if check if not must_build and "binary" not in format_control ...
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it nice to have the symmetry with other cases, and have a place to put that little comment about pip wheel --no-binary
which I thought useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm saying is that it seems like this can cause the wrong return value. If there is a condition later on that can result in a return value of False
, the way the code is here short-circuits that (and in particular means that --no-binary
is affecting the return value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, if you do the should_build()
approach, this issue will go away because the code will be structured differently. (It will wind up being equivalent to what I was suggesting you do here.)
if req.is_wheel: | ||
if not should_unpack: | ||
if must_build: | ||
logger.info( | ||
'Skipping %s, due to already being wheel.', req.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to decide if this log message really needs to be specific to one of the two cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, any opinion about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or remove the message completely? In my use cases (i.e. pip wheels -r requirements.txt
), it's not really useful to know that some requirements were wheels in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be the topic of another PR?
I think we would then need 3 methods?
At first glance it's not necessarily simpler. |
This isn't the only way. You can put the editable check in |
Another option is for |
@pradyunsg Thanks for your interest in this PR. Sorry if I was not clear. It was hard to understand and it is still a bit hard to explain :) Let me try to summarize. We have two cases in this area: With this PR, the behavior of Good luck with your talk. |
I wouldn't say it's big (just moving a couple chunks of code), but no, it doesn't have to be done as part of this PR. I will review again when I'm able to set aside some time. Hopefully soon. |
Okay, thanks for your patience, @sbidoul. Coming back to this, I'll start with some comments on def should_build(
req, # type: InstallRequirement
format_control, # type: FormatControl
need_wheel, # type: bool
):
if req.constraint:
return False
if req.is_wheel:
if need_wheel:
logger.info(
'Skipping %s, due to already being wheel.', req.name,
)
return False
if need_wheel:
return True
if req.editable or not req.source_dir:
return False
if "binary" not in format_control.get_allowed_formats(
canonicalize_name(req.name)):
logger.info(
"Skipping bdist_wheel for %s, due to binaries "
"being disabled for it.", req.name,
)
return False
return True The other thing to notice is that I've changed the Once this function is updated, I can take another look.. |
Instead of building to the target wheel directory, build to cache instead and then copy the result to the target directory. This more closely matches what pip install does and makes the cache more effective.
@cjerdonek the early return in should_build makes a lot of sense, done. I personally came to appreciate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments (not a complete review though).
src/pip/_internal/wheel.py
Outdated
if "binary" not in format_control.get_allowed_formats( | ||
canonicalize_name(req.name)): | ||
# TODO explain this | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned the duplication of these three conditions in your comments, which I think we should aim to avoid. I think an elegant way to address this can be to include this code at the beginning:
if need_wheel and not should_build(
req,
need_wheel=False,
format_control=format_control,
):
return True
This is basically a translation into code this English sentence that you wrote:
And in conditions where pip wheel builds while pip install would not have built, it uses the ephemeral cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I'm not convinced. I see benefit to have should_build and should_cache independent: it's easier to reason about them. For instance if you look at the comment for if req.editable
, there is a different reason to not build and to not cache editables. If we reintroduce a should_build
test in should_cache
, I think we pay a big price in readability to remove some maybe redundant tests that may be only accidentally the same.
Of course the fact that I don't fully understand the "no source_dir" and "no binary" parts does not help :) Yet I suspect the reasoning for caching and building in those cases could be different too.
Let me know what you think after reading the latest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjerdonek I gave one more thought to this and I now think this test need wheel and not should_build
it's actually not equivalent. In case need_wheel
is False it would skip the test and may end up, e.g. caching an editable in install mode. Tss what a rabbit hole I put myself in :)
So for now I'm quite happy with the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave one more thought to this and I now think this test need wheel and not should_build it's actually not equivalent. In case need_wheel is False it would skip the test and may end up, e.g. caching an editable in install mode.
I don't see any issue.. What I suggested was logically the same by design. In install mode with an editable, should_build()
would have returned False
, so should_use_ephemeral_cache()
would never have been called. To reiterate my comment, I don't think we should copy-paste all those lines verbatim. If it turns out that the logic needs to be different at some point in the future, we can always reconsider (but until then, YAGNI, as they say). If you'd prefer not to call should_build()
with different arguments inside should_cache()
, another possibility would be to refactor those copy-pasted lines into a helper function that is called from both functions. You can also document that should_cache()
should only be called when should_build()
is known to be true.
For readbility: no need to ask if we can cache if we have no cache.
I see there is quite of work ongoing in this area of the code (by @chrahunt in particular). |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Sorry, I thought I commented on this. I would rebase, I think our changes are very complementary (and I definitely like the split up of Someone will get to this, but it can vary how long that takes. If you want more of a guarantee of forward progress and have a little time I would separate out the refactoring changes into a few smaller PRs (literally no PR is too small), so those changes would be easier to evaluate in a single sitting. |
@chrahunt ok, thanks. I'll get back to this in a couple of weeks. |
Ok, it turns out the couple of weeks became a couple of months, but I'm back to this in #7262. I therefore close this PR to continue in smaller increments. |
Instead of building to the target wheel directory, build to cache and then copy the result to the target directory. This more closely matches what pip install does and makes the cache more effective.
Closes #6852