Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip should provide a way to specify index lookup order #8224

Open
stefanoborini opened this issue May 12, 2020 · 10 comments
Open

Pip should provide a way to specify index lookup order #8224

stefanoborini opened this issue May 12, 2020 · 10 comments
Labels
resolution: needs standard Should be agreed as a standard before implementation state: needs discussion This needs some more discussion

Comments

@stefanoborini
Copy link

stefanoborini commented May 12, 2020

What's the problem this feature will solve?

Currently, --extra-index-url will always operate after the pypi url, no matter what.
This has already been debated at length in #3454 and #5045, where it is hinted that not even by specifying --index-url takes over the order.

In these issues, the accepted solution is to use devpi, or just to use a non-taken name on pypi. However, both these solutions are workarounds:

  1. not all of us can use devpi. I am personally relying on artifactory with pypi support, and in large corporate environments you can't just install whatever you want.
  2. If I were to use a name that is not used on pypi, my service would break as soon as someone registers that name on pypi and puts versions that are above mine, basically taking over my installation. This is not only annoying, but also a security problem.
  3. if I were to register the name on pypi (which is not possible, unless you can push something to it, possibly fake), I could leak internal information about my company's process through the naming of the entities I reserve.

Describe the solution you'd like

Pip should have an additional option to specify exactly the order in which to honor the lookup for pypi services. This will allow to preserve backward compatibility, while solving the above issues.

Alternative Solutions

Workarounds are suboptimal, fragile, potentially a security issue, and rely on solutions that might not be implementable in a large corporate environment.

Additional context

See above posted issues.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label May 12, 2020
@uranusjr
Copy link
Member

uranusjr commented May 12, 2020

I’m not convinced with the “not be implementable” part, at least I don’t see anyone making the argument in either of the two issues you linked. --index-url always comes first, and in a corporate setting you can overwrite that, so pip never visits PyPI (and, if you want, put in PyPI as one of the --extra-index-url so pip visits PyPI only if it does not find a package on your own index). I also don’t really think the solutions are workarounds either. Do you have a concrete example? The ordering of --index-url and --extra-index-url is very explicitly specified; they are as robust as any other alternative option proposals.

@pfmoore
Copy link
Member

pfmoore commented May 12, 2020

Simply re-opening the same request isn't going to change the conclusion. Python packages are intended to be unique by name, so if you use a name that is taken by another package, you will get conflicts.

I understand the issue around using a name that is subsequently taken, but this is not something that should be solved at the pip level. If you want to have a means for "reserving" names or a namespace, that is a standardisation issue across all packaging tools, and in particular should be something covered by PyPI. So this needs to be raised as a proposal for a packaging interoperability PEP.

There is already some discussion on this on Discourse. I'd suggest that if you want to riase this topic, you'd need to have a discussion there - but be aware that this is potentially a very complex issue.

(One relatively simple proposal might be to have PyPI "reserve" a certain prefix for local use, similar to the way some IP addresses are guaranteed available locally. That might address the "how do I name my private packages?" problem without getting sucked into global name registry issues).

But Python's packaging is unlikely ever to change the principle that every package named foo, regardless of source, is expected to be the same project. That's built into far too much of the infrastructure to change.

I'll leave this issue open for a short while, as I don't want to close it without giving the OP time to respond, but I'm strongly of the view that there's nothing pip can or will do here, without a standard to back any change.

@pfmoore pfmoore added resolution: wrong project Should be reported elsewhere state: needs discussion This needs some more discussion labels May 12, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label May 12, 2020
@pfmoore
Copy link
Member

pfmoore commented May 12, 2020

(If anyone can find better labels, feel free to change - "wrong project" isn't ideal, but there isn't a "needs a standard" label...)

Edit: Never mind, I added one 🙂

@pfmoore pfmoore added resolution: needs standard Should be agreed as a standard before implementation and removed resolution: wrong project Should be reported elsewhere labels May 12, 2020
@stefanoborini
Copy link
Author

I’m not convinced with the “not be implementable” part, at least I don’t see anyone making the argument in either of the two issues you linked. --index-url always comes first

Issue #5045 explicitly says that this is not the case.

@stefanoborini
Copy link
Author

stefanoborini commented May 12, 2020

I understand the issue around using a name that is subsequently taken, but this is not something that should be solved at the pip level. If you want to have a means for "reserving" names or a namespace, that is a standardisation issue across all packaging tools, and in particular should be something covered by PyPI. So this needs to be raised as a proposal for a packaging interoperability PEP.

the problem is that packages should be namespaced somehow.

(One relatively simple proposal might be to have PyPI "reserve" a certain prefix for local use, similar to the way some IP addresses are guaranteed available locally. That might address the "how do I name my private packages?" problem without getting sucked into global name registry issues).

That would be very nice indeed.

I'll leave this issue open for a short while, as I don't want to close it without giving the OP time to respond, but I'm strongly of the view that there's nothing pip can or will do here, without a standard to back any change.

The point is that the assumption that the name is unique is fine, but it is conditional to my choice of what order I consider them unique and with highest priority in resolving a given name.

@mboisson
Copy link

mboisson commented Aug 9, 2024

@pfmoore, since there are flags such as: --only-binary <format_control>, --no-binary <format_control>, --implementation <implementation>, --abi <abi>, which allows to control pip such that it only considers source/wheels matching specific conditions, do you think flags such as --only-localversion or something similar would be reasonable ?

To address this issue, we have started tagging all of our wheels with a localversion specifier, and it works to prefer this against an equivalent upstream version, but when a new upstream version comes out, it takes over.

Having a --only-localversion <format_control> option, which would only consider wheels with a local version tag to be considered for the packages listed, would give people dealing with the current issue a way to address it without the burden of running their own index.

@notatallshaw
Copy link
Member

Is this different from #8606?

As I mention in #8606 (comment) we have a real world example now, in uv, of a pip-like API that by default guarantees an ordering of index lookups. uv has had to implement --index-strategy because there a real world use cases where it does not work for users.

@mboisson
Copy link

mboisson commented Aug 9, 2024

It seems to be a very similar (identical?) issue, yes. My proposal was meant to be a not-so-heavy proposal as changing the index strategy for all packages, and more of a tailored approach which could still address the core of the issue (the impossibility to tell pip to not look at pypi for specific packages).

@pfmoore
Copy link
Member

pfmoore commented Aug 9, 2024

@pfmoore, since there are flags such as: --only-binary <format_control>, --no-binary <format_control>, --implementation <implementation>, --abi <abi>, which allows to control pip such that it only considers source/wheels matching specific conditions, do you think flags such as --only-localversion or something similar would be reasonable ?

Honestly, no I don't. I think we already have far too many such flags, and I think adding more is not only going to add confusion, but also suggests there's an underlying problem.

Rather than simply adding another flag any time someone comes up with a new problem, we should look deeper. That's a lot harder, but ultimately will give better results. Specifically, if anyone has the interest in addressing this (I don't, myself) they should consider the following structure:

  1. Ways of limiting the set of files pip sees. This is what --only-binary and --no-binary do, as well as being how constraint files work (broadly).
  2. Ways of affecting how pip chooses the "best" file from a set of otherwise equally acceptable files. This is where --prefer-binary lives, along with the sort of index priority ideas people are coming up with in this thread.
  3. Ways of overriding the environment pip is installing to. This is what --abi, --implementation, and --platform do. These are quite different, as they need to affect more than just what files pip selects, I only list them here because you mentioned them alongside --{only,no}-binary and I wanted to call out that they are very different.

I'll also note that all of this sort of "what pip sees" control can be done right now, using an index proxy like simpleindex. So in theory, we don't "need" any of these options (specifically option types 1 and 2). But clearly running an index proxy isn't very user friendly, as is demonstrated by the fact that we still get requests for this to be added to pip. But IMO, we should be comparing any set of options provided in pip with the approach of running a proxy, and only offer ones that have a clear use case, explicitly stating that using a proxy is our supported solution for all other cases. It's worth pointing out #11771, which is relevant here, as well.

All of this (as well as #8606) relies on someone caring enough to create a PR for it, though. At the moment, no-one who is actually affected by these issues seems to be interested in doing that, unfortunately 🙁

@mboisson
Copy link

mboisson commented Aug 9, 2024

But clearly running an index proxy isn't very user friendly

Not only not user friendly, it can simply be forbidden by network/security rules... Running a proxy server is not something that is isolated to a given user's process, it is a whole-node server. It is a security issue on shared infrastructures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution: needs standard Should be agreed as a standard before implementation state: needs discussion This needs some more discussion
Projects
None yet
Development

No branches or pull requests

5 participants