Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repositories: add support for PEP 658 #5509

Merged
merged 7 commits into from
Feb 20, 2024

Conversation

abn
Copy link
Member

@abn abn commented Apr 27, 2022

This change allows Poetry to make use of PEP 503 "simple" API repositories that implement PEP 658 for core metadata.

Requires: python-poetry/poetry-core#333
Relates-to: #4231 (comment)

.github/workflows/main.yml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@Secrus
Copy link
Member

Secrus commented May 20, 2023

Updated and rebased to current master

@Secrus Secrus marked this pull request as ready for review May 20, 2023 13:54
@Secrus Secrus requested review from a team May 20, 2023 13:55
src/poetry/repositories/http_repository.py Outdated Show resolved Hide resolved
src/poetry/repositories/http_repository.py Outdated Show resolved Hide resolved
src/poetry/repositories/http_repository.py Outdated Show resolved Hide resolved
# Prefer to read data from wheels: this is faster and more reliable
wheels = urls.get("bdist_wheel")
if wheels:
if wheels := urls.get("bdist_wheel"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I got it, we risk falling back to package inspection when:

  • A high priority package (wheel) does not have PEP 658 metadata
  • A low priority package (sdist) does have PEP 658 metadata

In this case we would get the sdist metadata from the repository but then we wouldn't use it.

I would prefer a more aggressive strategy that favours PEP 658 over package inspection. Something like (in this order):

  1. PEP 658 metadata of wheel
  2. PEP 658 metadata of sdist
  3. wheel inspection
  4. sdist inspection

While currently it feels more like:

  1. PEP 658 metadata of wheel
  2. wheel inspection
  3. PEP 658 of sdist
  4. sdist inspection

One way you could achieve this is by reverting changes to _get_info_from_urls (which would still act on packages only) and inside _links_to_data something like:

if metadata:
    # We got PEP 658 metadata from at least one package.
    # Either take one randomly or use some fancier logic (wheels over sdist)
    info = _get_info_from_metadata(metadata)
else:
    info = _get_info_from_urls(urls)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible network optimization in case we go for the "random" approach: as soon as we got PEP 658 metadata from one url, stop querying others as we don't need anything else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as is for now. The current approach has the advantage that it does not change our logic (which wheel's metadata we chose) so we get the same results no matter if we can download metadata or if we have to download the wheel/sdist.

Further, I suppose that in 99 % of use cases it's not relevant because either all files of a release have PEP 658 metadata or none of them. (I may be wrong, but in that case we can adjust it later.)

src/poetry/repositories/http_repository.py Outdated Show resolved Hide resolved
@dimbleby
Copy link
Contributor

I see that pypi recently started supporting offering up metadata separately from wheels - and I wouldn't mind betting that it's the only major repository that does so. Kind of funny if pypi is the one repository for which poetry does not try using that metadata.

Of course support for pypi can be added later so this is not anything against this MR; it just amuses me.

(Using the metadata from pypi could be a fairly big win, I think, for those packages where the JSON API does not give the metadata that poetry wants. The JSON API has the problem that when it reports an empty list of dependencies, it could mean either "this package has no dependencies" or "I don't know the answer"; so poetry has to download and inspect a potentially large distribution file. Being able instead to download a small metadata file should make a meaningful different to resolution time in some cases)

@Secrus
Copy link
Member

Secrus commented May 20, 2023

I see that pypi recently started supporting offering up metadata separately from wheels - and I wouldn't mind betting that it's the only major repository that does so. Kind of funny if pypi is the one repository for which poetry does not try using that metadata.

Of course support for pypi can be added later so this is not anything against this MR; it just amuses me.

(Using the metadata from pypi could be a fairly big win, I think, for those packages where the JSON API does not give the metadata that poetry wants. The JSON API has the problem that when it reports an empty list of dependencies, it could mean either "this package has no dependencies" or "I don't know the answer"; so poetry has to download and inspect a potentially large distribution file. Being able instead to download a small metadata file should make a meaningful different to resolution time in some cases)

Yeah, it was already raised on Discord, I am working on adding that.

@radoering
Copy link
Member

I fixed the parsing of data-dist-info-metadata for special cases, added support for PEP 714 (renaming to data-core-metadata), added tests and applied some review feedback. (Separate commits to make it traceable for now. I suppose in the end we'll squash it anyway.)

Further, I added PEP 691 support as fallback for PyPI (last commit). We could make it a separate PR, but I think the change is still small enough especially when squashed because there's some refactoring of the previous commits included.

@dimbleby
Copy link
Contributor

dimbleby commented Nov 19, 2023

A couple of doubts.

  • if I read this right, poetry is downloading all of the metadatas even though it is definitely going to use at most one of them? Seems wasteful
  • I was thinking about this PEP last week and I'm not sure when it is going to be helpful for poetry...

What I mean is: I expect that on pypi the json API almost always agrees with one of the metadatas. So if poetry has decided that the json data doesn't contain any requires-dist - then likely neither will any of the metadatas, and it is going to have to download a whole distribution anyway. ie separately inspecting the metadata is an extra step that probably won't help.

The exceptional case where this can help is for packages that (i) do have dependencies (ii) uploaded their sdist first. (Because this hits the warehouse oddity in which the json api is populated from the first uploaded package). Then the json api will show no requires-dist, but metadata from one of the wheels will fill that gap without downloading the whole distribution.

However this case is rare: the popular CI tools and workflows upload wheels first. (dulwich is a counterexample)

All of which is I guess a long-winded way of asking whether you have any evidence about the practical effect of this? Is it a lot of extra round trips to pypi for no benefit, or does it save lots of wheel downloads? And since the answer is likely "a bit of both" - how does that balance work out?

@radoering
Copy link
Member

All of which is I guess a long-winded way of asking whether you have any evidence about the practical effect of this? Is it a lot of extra round trips to pypi for no benefit, or does it save lots of wheel downloads? And since the answer is likely "a bit of both" - how does that balance work out?

Those are some very good points. I haven't done any performance tests yet but I completely agree that it is necessary.

if I read this right, poetry is downloading all of the metadatas even though it is definitely going to use at most one of them? Seems wasteful

I overlooked that. Metadata fetching should definitely be lazy.

I expect that on pypi the json API almost always agrees with one of the metadatas. So if poetry has decided that the json data doesn't contain any requires-dist - then likely neither will any of the metadatas, and it is going to have to download a whole distribution anyway. ie separately inspecting the metadata is an extra step that probably won't help.

IIUC the huge difference is that we don't trust the (non-standard) json API if requires-dist is null, but no requires-dist in the metadatas means that there are no dependencies for sure so that we do not have to download the distributions. Or did I misunderstand the PEP? I suppose the current implementation will not download any distributions if the relevant metadatas say that there are no dependencies.

@radoering radoering marked this pull request as draft November 19, 2023 12:15
@dimbleby
Copy link
Contributor

I think you are right about trusting the metadata when it says "no dependencies" - and I agree that this should be a big win.

(We don't trust sdists when their PKG-INFO says "no dependencies" and I had misremembered / confused myself).

@Secrus Secrus added the status/waiting-on-core Requires changes to poetry-core first label Jan 30, 2024
@Secrus Secrus removed their assignment Jan 30, 2024
@radoering radoering removed the status/waiting-on-core Requires changes to poetry-core first label Feb 3, 2024
@radoering
Copy link
Member

blocked by #8940

@radoering radoering force-pushed the repository-pep-658 branch 2 times, most recently from c286e84 to a28cbab Compare February 5, 2024 16:45
@radoering
Copy link
Member

This should be ready now. I'll probably check again later this week but in case anyone wants to take a look...

@radoering radoering marked this pull request as ready for review February 5, 2024 16:57
@radoering radoering merged commit cff4d7d into python-poetry:master Feb 20, 2024
20 checks passed
@abn
Copy link
Member Author

abn commented Feb 23, 2024

@Secrus @radoering @dimbleby thanks for taking this over 🎉

@abn abn deleted the repository-pep-658 branch February 23, 2024 13:21
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants