Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite direct URL reinstallation logic #10564

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Oct 9, 2021

Fix #5780.

The logic is almost #5780 (comment), but I dropped the last part about non-PEP-508 paths since it turns out to not be actually viable without major refactoring (pip does not currently pass the original argument format to the resolver; all requirements are URLs when they reach this part).

@pradyunsg It also turns out this has some effect on #10543; with this implementation, sdist will not be automatically reinstalled, but --upgrade (not --force-reinstall as described in the deprecation message) can actually be used to override that behaviour. But since we have removed that deprecation warning, should we continue to keep the behaviour of always reinstalling sdists (this implementation currently does not), and/or show that deprecation message again?

@pypa-bot pypa-bot added the needs rebase or merge PR has conflicts with current master label Oct 9, 2021
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Oct 9, 2021
@pypa pypa deleted a comment from pypa-bot Oct 9, 2021
@uranusjr uranusjr changed the title Add utilities to check PEP 610 to incoming link Rewrite direct URL reinstallation logic Oct 9, 2021
@uranusjr uranusjr added this to the 22.0 milestone Oct 9, 2021
@uranusjr uranusjr force-pushed the explicit-url-upgrade branch 2 times, most recently from ad9cff7 to 59faab5 Compare October 9, 2021 21:08
@pradyunsg
Copy link
Member

should we continue to keep the behaviour of always reinstalling sdists

I think we should reinstall the sdist, when specified explicitly as a "local path".

@uranusjr
Copy link
Member Author

uranusjr commented Oct 9, 2021

when specified explicitly as a "local path"

Does this include file:///.../...tar.gz and project @ file://...? The resolver cannot really distinguish those against a "bare" path.

@pradyunsg
Copy link
Member

Both of those should have direct urls set, while the local path should not?

@pradyunsg
Copy link
Member

❯ pip install ./dist/pip-21.3.dev0.tar.gz
Processing ./dist/pip-21.3.dev0.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing wheel metadata (pyproject.toml) ... done
❯

Yea, as implemented, the behaviour in the case we're discussing is... suboptimal.

@pradyunsg
Copy link
Member

Both of those should have direct urls set, while the local path should not?

Meh, I don't know.

PEP 610 doesn't directly answer this question (duh!) because it only defines things in terms of PEP 440 / PEP 508. The question of "are local paths supposed to be treated the exact same as PEP 508/440 URL requirements" is... not something I'm touching at 00:30 on a Sunday. :)

@pradyunsg
Copy link
Member

pradyunsg commented Oct 9, 2021

Does this include file:///.../...tar.gz and project @ file://...? The resolver cannot really distinguish those against a "bare" path.

I think it's reasonable to always reinstall from file:// URLs. If we're excluding any case, I'd exclude the PEP 508 usage when there's also direct_url stuff.

You can check for that by reaching into the underlying Requirement on the InstallRequirement, and checking if that has a URL set on it. That should be a matter of tweaking the is_wheel conditional to something like:

if cand_link.is_file and (cand_link.is_wheel or ireq.req.url is None):
    return ireq

@sbidoul
Copy link
Member

sbidoul commented Oct 10, 2021

As I mentioned elsewhere before (can't find where now), it is not at all obvious to me that we should do something different based on whether the user put a file:// or not in front of a path to a local artifact.

I'm wondering if things are now mature enough to finish this matrix that I posted before and then make a test suite out of it.

@uranusjr
Copy link
Member Author

I think I agree with Stéphane, treating file: specially does not stand particular well with me. The rationale I kept the local wheel case is

  1. Installation from wheel is quite fast anyway (both local sdist are directory still have considerable overhead even with in-tree-build).
  2. This can be used as a solution for cases where the user wants to force-reinstall a in-development package, but not any of its dependencies. --force-reinstall would reinstall all dependencies, which can be pretty slow, and --upgrade needs to perform resolution that's unnecessary for the user case. If local wheels are treated differently, the user can work around this by simply pre-build the wheel and tell pip to install that.

@uranusjr uranusjr force-pushed the explicit-url-upgrade branch 2 times, most recently from 5387547 to 59ff7ed Compare November 18, 2021 10:48
@uranusjr
Copy link
Member Author

uranusjr commented Nov 18, 2021

OK, so I revised the aforementioned condition. Now all three of

  • req @ file://url/to/local/location
  • a/path/to/something
  • file://url/to/local/location/without/pep508

are automatically reinstalled.

@uranusjr uranusjr force-pushed the explicit-url-upgrade branch 2 times, most recently from aaa5d94 to 85faa80 Compare November 18, 2021 11:43
@uranusjr
Copy link
Member Author

I think this is ready!

@uranusjr uranusjr marked this pull request as ready for review November 18, 2021 14:48
@pradyunsg
Copy link
Member

Haven't done a proper clone-and-play-with-it routine, but this PR feels like it's missing a bunch of tests for the reinstall behaviour that this is going to provide users.

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Nov 20, 2021
@uranusjr
Copy link
Member Author

Any thoughts on how this can be best tested? Since we’re always re-installing file: URLs, we need to somehow serve wheels for testing via HTTP. Is there existing mechanism for this?

@pradyunsg
Copy link
Member

pradyunsg commented Nov 21, 2021

We have a mock_server fixture, that creates and serves an index server. There's a test with that in test_install.py already.

You can use that, with http and https, with custom responses set up.

@sbidoul
Copy link
Member

sbidoul commented Jun 22, 2022

@uranusjr I spent a couple of hours experimenting with your suggestion to leverage --upgrade in uranusjr#2

I'm also preparing a comparison of what this does vs pip 22.1 (and possibly the legacy resolver). It will take time though, so in the meantime you may want to have a look and see if it goes in the right direction for you.

@alkasm
Copy link

alkasm commented Nov 2, 2022

Light bump to see where peeps are at on this effort.

Now --upgrade can trigger a non-URL requirement to upgrade an existing
URL-installed distribution. Some logic is also tweaked so legacy
editable installations (which lack direct_url.json) is also considered
URL-specified and treated the same as non-editable direct URL and modern
(PEP 660) editable installations.
@AvnerCohen
Copy link
Contributor

@uranusjr

Given this simple setup.py:

from setuptools import setup

setup(
    name="testing-git",
    install_requires=[
        "requests @ git+ssh://git@github.com/psf/requests.git@15585909c3dd3014e4083961c8a404709450151c"
    ]
)

I'd expect a second pip install . to be very fast, basically, local SHA test. nothing more.

With latest pip version (23.0) installation resulting time is over 20 seconds and more importantly, the log clearly shows:

Running command git clone --filter=blob:none --quiet ..
..

So a full clone still occurs.

With your patched version uransurjr/pip/explicit-url-upgrade It actually takes longer on my machine, but again, outside of the absolute time, I am failing to understand why is a full git clone needed here?

@uranusjr
Copy link
Member Author

Are you testing it right? I just tried and a second pip install . finishes almost instantly. The first one obviously needs to clone because it needs to install that Git repository.

@sbidoul
Copy link
Member

sbidoul commented Jan 31, 2023

@uranusjr just mentioning this is still on my radar, but I need to set aside enough contiguous hours to dive into this again.

@AvnerCohen
Copy link
Contributor

AvnerCohen commented Jan 31, 2023

@uranusjr Are you testing it right? is hard for me to answer, I have never done this before.

$> pip -V
pip 23.0.dev0. .... 
$> time pip install . --quiet  3.95s user 1.26s system 25% cpu 20.303 total
pip install . --quiet  3.95s user 1.26s system 25% cpu 20.303 total

If there is anything wrong I am done, would love to help testing here.
(if relevant, I am working on OSX Venture, 13.1)

@uranusjr
Copy link
Member Author

I don’t know, I simply don’t see what you see. Maybe it would be easier to communicate if you can provide something like a script or Dockerfile for reproduction.

@sbidoul
Copy link
Member

sbidoul commented Mar 26, 2023

@uranusjr I finally found some time to come back to this.

I created a dedicated test harness with my view of the expected re-installation behaviour.

Here is the current result, comparing pip 23.0.1 with this PR: report.html.gz

I see the following areas where we still need refinement and/or more discussion:

  1. Reinstallation from index over a direct URLs installed package (including editables) and vice-versa: I think it is fine to not reinstall by default if the version match, but I think we need a way to force a reinstall that is more targeted than --force-reinstall which reinstalls everything.

Basically, what I'm after is a mechanism to efficiently install the output of pip freeze and make sure the direct URLs in the frozen requirements are installed, and also make sure that version-specified requirements are reinstalled from index when already installed from a direct URL.

I think overloading --upgrade as you suggested above would be a reasonable way to achieve that.

  1. Reinstallation of archive URLS (file:// or https://). I still have mixed feelings about that.

My current thinking is that it would be reasonable not to reinstall if the version match, but also overload --upgrade to force a reinstallation of the provided artifact.

  1. For better performance, we should not reinstall a VCS URL with a commit hash ref, if the same commit is already installed (as determined by the commit_id field in direct_url.json)

Let me know your thoughts.

@uranusjr
Copy link
Member Author

Reinstallation of archive URLS (file:// or https://).
My current thinking is that it would be reasonable not to reinstall if the version match, but also overload --upgrade to force a reinstallation of the provided artifact.

How about if we also add a warning if the hashes don’t match? That would help users debug the situation.

For better performance, we should not reinstall a VCS URL with a commit hash ref, if the same commit is already installed (as determined by the commit_id field in direct_url.json)

Not re-installing immutable refs by default (but always re-install typically mutable ones) seems reasonable. And --upgrade can also be overloaded here?

@alkasm
Copy link

alkasm commented Apr 1, 2023

but always re-install typically mutable ones

If this is the route chosen, would like to request an escape hatch if possible which doesn't require pinning a hash, as that doesn't work well for library dependencies.

Worth noting that this eager-by-default is not the current behavior of pip with respect to dependencies from PyPI.

@uranusjr
Copy link
Member Author

uranusjr commented Apr 8, 2023

I think this problem fundamentally represents how mutable dependencies are too loose by design and used in very different scenarios that require different treatments. From past reports, these are mainly used for development, where people tend to want auto upgrades. So what I would personally suggest is to not use them if you are depending on those URLs in production; either use constraints (that pin to immutable hashes), or produce versioned wheels instead.

@alkasm
Copy link

alkasm commented Apr 8, 2023

From past reports, these are mainly used for development

This isn't the case for me. I could provide some more color on my use-case, but I'm not really looking for alternatives. This is a bug in pip that was introduced with the new resolver, or a feature that was removed and no longer possible, whichever way you want to look at it. At the very least, some way to ask pip to not reach out to the internet when a package is already installed, like it does with every other dependency specification except VCS URLs, seems like something that could be supported directly with a flag if not the default behavior. This would solve my issues, but also isn't specific to my use-case.

A branch is mutable in the same way that a non-pinned PyPI dependency is, but you get different feature sets with these specifications. I understand that the metadata being available via PyPI makes these dependency specifications very different in implementation, but from a user PoV, they have very similar semantics.

@andersk
Copy link
Contributor

andersk commented Apr 9, 2023

Zulip uses Pip in both production and development; it has been stuck on Pip 20.3 with --use-deprecated=legacy-resolver for years because of this issue. We pin all our transitive dependencies to match what we tested in CI, and don’t want auto-upgrades (we do regular upgrades on our own terms by re-generating the pins with pip-tools). Our use case matches the one described by @sbidoul:

Basically, what I'm after is a mechanism to efficiently install the output of pip freeze and make sure the direct URLs in the frozen requirements are installed, and also make sure that version-specified requirements are reinstalled from index when already installed from a direct URL.

This seems like a reasonable thing to expect a package manager to be able to do.

@uranusjr
Copy link
Member Author

Basically, what I'm after is a mechanism to efficiently install the output of pip freeze and make sure the direct URLs in the frozen requirements are installed, and also make sure that version-specified requirements are reinstalled from index when already installed from a direct URL.

This seems like a reasonable thing to expect a package manager to be able to do.

Since pip freeze pins a VCS dependency to its revision ID, re-applying the output to the same environment would not induce network requests following the proposed scheme. And if you also supply a URL-based requirement (that does not pin to revision), the two URLs would cause a conflict. So the two of you are already asking different things, unless I misunderstand either of your use cases.

@uranusjr
Copy link
Member Author

Trying to clear out things. After the proposal is implemented, pip would:

  • Pin a URL requirement to the hash revision in freeze (already does)
  • Conflict (and refuse to install) if it seems a package is required via two different URLs
  • Reach to network if a package is required via a mutable VCS URL

So if you

  1. Create an environment with a git+ requirement
  2. Generate a freeze from the environment
  3. pip install -r (or -c + top-level packages) the freeze output to the environment

No network requirements will be induced because the commit ID in the freeze output matches the already-installed distribution.

However, it would not work if you also supply a requirement to the same package using a mutable VCS URL, since the mutable URL would differ from the URL from freeze and cause a conflict. So if a package wants to depend on the URL-required package, you must use a version-based requirement instead, and enforce the URL with the freeze output.

@htcrane
Copy link

htcrane commented Jan 23, 2024

@uranusjr curious if this has gotten more attention since the last comment. I ask because pip's VCS documentation does not mention the logic of when an installed package from a VCS will be upgraded nor is there a satisfying answer on stack overflow.

With the current release, will a package installed with MyProject @ git+https://git.example.com/MyProject always upgrade or only upgrade when a new commit is present in the main branch? Also, what about installed with a git ref, specifically a version tag: MyProject @ git+https://git.example.com/MyProject.git@v1.0. If installed in such a way, will it upgrade only if the tag is incremented to a higher version?

@uranusjr
Copy link
Member Author

I feel the logic is done, the blocker is not me.

@htcrane
Copy link

htcrane commented Jan 23, 2024

@sbidoul

@sbidoul
Copy link
Member

sbidoul commented Jan 23, 2024

I'm not blocking anything, just commenting ;)

I've quickly reviewed the report I had attached to #10564 (comment). I think my main concern is about the scenarios where this PR reinstalls with --upgrade, when it does not currently reinstall with pip main.

For instance:

image

The rest can be considered improvements that I'd find nice to address at the same time but could also be addressed in followups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating remote links with new URLs for PEP508 functionallity
9 participants