Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-think index priority design #9610

Closed
1 task done
bennuttall opened this issue Feb 14, 2021 · 7 comments
Closed
1 task done

Re-think index priority design #9610

bennuttall opened this issue Feb 14, 2021 · 7 comments
Labels
C: finder PackageFinder and index related code state: needs discussion This needs some more discussion type: enhancement Improvements to functionality

Comments

@bennuttall
Copy link

pip version

21.x

Python version

3.7 / 3.8

OS

ubuntu / debian / raspberry pi os

Additional information

I run an alternative package index for raspberry pi users at piwheels.org and I'd noticed a sharp and sustained increase in downloads from piwheels over the last week or so - and started to investigate the download logs. It seems that we're now serving many more pure python wheels than before, which only usually happens when a package doesn't provide a wheel on PyPI, or they explicitly target piwheels as the primary index.

Description

prior to pip 21, given a choice between two "equal" files from two indexes, pip would choose to download the file from the primary index. since pip 21, it chooses to download the file from the extra index.

In the case of requests, if the user runs pip install requests with their extra index set to piwheels, pip discards all files which are not the latest version, and is left with:

  • sdist from pypi
  • wheel from pypi (pip <21 chooses this)
  • wheel from piwheels (pip >=21 chooses this)

Expected behavior

Since I haven't seen this behaviour change in the changelog, it seems like it might be unintentional, perhaps a side-effect of the new resolver.

How to Reproduce

pip3 wheel requests --no-deps --no-cache-dir --disable-pip-version-check --extra-index-url https://www.piwheels.org/simple -v

This command will download from PyPI in pip 20 and from piwheels in pip 21.

Output

See attached file

Code of Conduct

  • I agree to follow the PSF Code of Conduct

I emailed @brainwane about this and they suggested to post it here and let @pradyunsg know.

requests.txt

@bennuttall bennuttall added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 14, 2021
@pfmoore
Copy link
Member

pfmoore commented Feb 14, 2021

Since I haven't seen this behaviour change in the changelog, it seems like it might be unintentional

It's undefined which index gets used if both serve the same file, so both behaviours are correct. We didn't intentionally change the order, but as it's not guaranteed, we don't consider the change to be a regression or bug.

There have been a number of comments on this change in behaviour, so it's possible that we might want to look at making it possible to control the order, but we'd treat it as a feature request rather than a bug.

To better understand the requirement here, can you clarify why you host duplicates of pure Python wheels that exist on PyPI? I assume that the problem for you is that the change in behaviour is affecting your bandwidth usage - could you address this by simply not hosting files that are on PyPI as well? (I'm not saying you must do that and we don't intend to support you if you don't, just trying to understand why you made the decision you did).

@bennuttall
Copy link
Author

To better understand the requirement here, can you clarify why you host duplicates of pure Python wheels that exist on PyPI?

We build wheels of everything on PyPI. Sometimes the wheels are pure python, sometimes they're platform wheels. Either way, they go in the index.

I assume that the problem for you is that the change in behaviour is affecting your bandwidth usage

It's not a problem for us - this isn't a complaint - I'm just seeking clarification on the desired behaviour.

There have been a number of comments on this change in behaviour, so it's possible that we might want to look at making it possible to control the order, but we'd treat it as a feature request rather than a bug.

I feel like it should be deterministic which takes precedence. I'd assumed one being labelled "index" and one being "extra-index" indicated precedence - and that assumption was confirmed in earlier versions of pip. It made sense to me that what I'd considered the primary index took precedence.

@uranusjr
Copy link
Member

uranusjr commented Feb 14, 2021

Indeed it would be beneficial to the users (in edge cases, at least) if the behaviour is explicitly documented and has a test case attached. Expectability would also mean the user can make use of the behaviour when bandwidth is an issue (e.g. in a setting where Internet connection is limited). But on the other hand, simply documenting and guaranteeing a behaviour may not a good idea either, since it encourages the user to abuse the behaviour and break the same-name-same-file policy. So some design work would be required correctly work this out.

@uranusjr uranusjr added C: finder PackageFinder and index related code type: enhancement Improvements to functionality and removed S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 14, 2021
@uranusjr
Copy link
Member

uranusjr commented Feb 14, 2021

I think I’m going to re-purpose this into a feature/clarification request around the design of index discovery.

@uranusjr uranusjr added the state: needs discussion This needs some more discussion label Feb 14, 2021
@uranusjr uranusjr changed the title pip v21.x favours additional indexes Re-think index priority design Feb 14, 2021
@vanschelven
Copy link
Contributor

The existence of #9612 is relevant to the discussion of the present issue also.

I'd assumed one being labelled "index" and one being "extra-index" indicated precedence - and that assumption was confirmed in earlier versions of pip. It made sense to me that what I'd considered the primary index took precedence.

This is what I assumed and then verified also... in the light of the other issue, this confusion is more than a source of slight confusion, it is in fact a part of the reason the mentioned attack exists.

@bennuttall
Copy link
Author

It's definitely related, but I think the distinction is that one is about the order of precedence between different versions, and the other is about the order of precedence between files with the same version and "equal" distribution definition (i.e. py2.py3-none-any.whl equals py2.py3-none-any.whl).

--extra-index-url allows people to use an alternative index which mirrors the packages/versions in PyPI but e.g. provides wheels for their platform - and these indexes are not susceptible to the security implications raised in #9612.

@pradyunsg
Copy link
Member

Bundling into #8606.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: finder PackageFinder and index related code state: needs discussion This needs some more discussion type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests

5 participants