Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip keeps downloading (often multiple times) stuff that it does not really need #9978

Closed
1 task done
callegar opened this issue May 13, 2021 · 10 comments
Closed
1 task done
Labels
resolution: needs standard Should be agreed as a standard before implementation

Comments

@callegar
Copy link

Description

On an up to date virtualenv, trying to update spyder and jupyterlab using

pip install -U --update-strategy=eager jupyterlab spyder

I keep seeing over and over the download of packages that are not needed because they are already there:

  • idna
  • Sphinx
  • flake8
  • autopep8
  • Jinja2

In many cases the packages are downloaded multiple times at multiple versions:

  • Sphinx-4.0.1, Sphinx-4.0.0, Sphinx-4.0.1 (again), Sphinx-4.0.0 (again)
  • flake8-3.9.2, flake8-3.9.1, flake8-3.9.0
  • Jinja2-2.11.3, Jinja2-2.11.0
  • ... and more

I believe that a similar eagerness at downloading unneeded stuff had already been present and fixed.

Expected behavior

pip should only download what it actually needs.

pip version

21.1.1

Python version

3.8.5

OS

Ubuntu linux 20.04

How to Reproduce

See description.

Output

No response

Code of Conduct

@callegar callegar added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels May 13, 2021
@pfmoore
Copy link
Member

pfmoore commented May 13, 2021

Doesn't the fact that you're using --update-strategy=eager mean that you've explicitly given pip permission to download extra versions of already-installed packages, to check if they are possible upgrades? If you don't want pip to do that, leave out that option.

@callegar
Copy link
Author

callegar commented May 13, 2021

Sholdn't pip be able to see if a package is a possible upgrade from its metadata without having to download the whole package?

Furthermore, I am pretty sure that this was not happening, then started happening, then got fixed, then now it is starting to happen again. Guess it was with bug #9516. Previously a fix was provided by pull #9522 I believe.

@pfmoore
Copy link
Member

pfmoore commented May 13, 2021

Unfortunately, the only way pip can get the metadata is by downloading the package...

We're tweaking the resolution algorithm based on user feedback. That means that the order in which we explore the space of possible solutions has been changing. There's no "do everything as fast as possible" setting we can apply here, just options to prioritise common problems over uncommon ones, when it comes to fast resolution times. We've not heard many reports from people using --upgrade-strategy=eager, so it's quite possible that's not a common scenario, hence why it might have got worse as we improve more common cases. (That's just speculation, though, it'd need more analysis of this specific case, which I don't have time for right now, to say anything more precise).

@callegar
Copy link
Author

callegar commented May 13, 2021

Unfortunately, the only way pip can get the metadata is by downloading the package...

That seems rather painful. Does it mean that if one succeeds making a package that is mostly data on pypi accounting for a GB, and I need it, I may end up downloading multiple times whenever I try do to an eager upgrade of other packages? It also seems to add up a lot of complication to the design of package managers like pip. Doesn't pypi extract the metadata in any way? At least for own consumption of its very website that seems necessary.

@uranusjr
Copy link
Member

No, pip caches downloads so you only hit network once. (It should be in you’re case as well, pip may fetch the package multiple times but only actually download once.)

@callegar
Copy link
Author

callegar commented May 13, 2021

Unfortunately, can't cache here. Pip caches all the package, not just the metadata. Plus it caches per virtual env, so that the space usage gets multiplied per virtualenv #.

@uranusjr
Copy link
Member

pip does not cache per virtual environment but per user on your machine.

@callegar
Copy link
Author

Sounds better, but still impossible to do on a travel laptop with very limited ssd where the pip cache easily grows many times larger than all the rest of my home dir. And because it is a travel laptop and I use it in travel, network traffic may get slow and costly. Will advocate a metadata-only cache in another report, I guess.

@piotr-dobrogost
Copy link

@callegar

Doesn't pypi extract the metadata in any way?

See pypi/warehouse#8254

@uranusjr
Copy link
Member

I’ll close this for now since there really isn’t anything pip can do at this time. Hopefully we can get that through and implemented to improve the distribution download profile.

@uranusjr uranusjr added resolution: needs standard Should be agreed as a standard before implementation and removed S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels May 18, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolution: needs standard Should be agreed as a standard before implementation
Projects
None yet
Development

No branches or pull requests

4 participants