Skip to content

Local HTTP cache is confusing for package maintainers #5670

Closed
@di

Description

@di

I've seen this issue raised multiple times in different formats and mediums. Most recently on #pypa today, this question was asked:

Tried this twice today, is it unusual? New releases to PyPi take 10+ minutes before they can be installed via pip.

The reason for this is that pip uses CacheControl to maintain a local HTTP cache, and it will not retry requests made within the last 10 minutes:

resp = session.get(
url,
headers={
"Accept": "text/html",
"Cache-Control": "max-age=600",
},
)

Unfortunately, for a lot of package authors/maintainers, this is a common workflow:

  1. The have recently (< 10 minutes ago) runpip install my_package
  2. They publish a new version of my_package
  3. They then immediately run pip install my_package to try and install the new version
  4. pip says it can't find the new version (because the cache isn't old enough)

I think that this creates enough confusion about PyPI being "slow" or some type of user error that it's worth addressing in some way.

Some ideas for how this could be addressed:

  1. Disable the local cache entirely: this would probably be overkill, I don't think we should do that.
  2. Shorten the max-age for cache entries: This just shortens the window for when this could happen, but doesn't fix the underlying issue.
  3. Leave the cache, but add some extra API to determine if a project has been updated in the last 10 minutes: this would be a lot of work for both pip and PyPI maintainers, plus the response for this endpoint would be essentially un-cacheable (from both pip and PyPI's perspective)
  4. Tell users to use --no-cache: This works, but there isn't a great way to inform new users "this is what's happening" and "this is how to fix it" aside from doing it ad-hoc -- it's essentially where we are right now.
  5. Exclude the project pages from cache: project pages are pretty light, and always fetching them shouldn't cause an excessive increase in bandwidth or response time. I'm not sure if this is easy to do within pip, but seems like the best approach to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: cacheDealing with cache and files in itauto-lockedOutdated issues that have been locked by automation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions