-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repositories: add support for PEP 658 #5509
Conversation
2194dba
to
7fec494
Compare
7fec494
to
5295a45
Compare
Updated and rebased to current master |
5295a45
to
aef9975
Compare
# Prefer to read data from wheels: this is faster and more reliable | ||
wheels = urls.get("bdist_wheel") | ||
if wheels: | ||
if wheels := urls.get("bdist_wheel"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I got it, we risk falling back to package inspection when:
- A high priority package (wheel) does not have PEP 658 metadata
- A low priority package (sdist) does have PEP 658 metadata
In this case we would get the sdist metadata from the repository but then we wouldn't use it.
I would prefer a more aggressive strategy that favours PEP 658 over package inspection. Something like (in this order):
- PEP 658 metadata of wheel
- PEP 658 metadata of sdist
- wheel inspection
- sdist inspection
While currently it feels more like:
- PEP 658 metadata of wheel
- wheel inspection
- PEP 658 of sdist
- sdist inspection
One way you could achieve this is by reverting changes to _get_info_from_urls
(which would still act on packages only) and inside _links_to_data
something like:
if metadata:
# We got PEP 658 metadata from at least one package.
# Either take one randomly or use some fancier logic (wheels over sdist)
info = _get_info_from_metadata(metadata)
else:
info = _get_info_from_urls(urls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible network optimization in case we go for the "random" approach: as soon as we got PEP 658 metadata from one url, stop querying others as we don't need anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it as is for now. The current approach has the advantage that it does not change our logic (which wheel's metadata we chose) so we get the same results no matter if we can download metadata or if we have to download the wheel/sdist.
Further, I suppose that in 99 % of use cases it's not relevant because either all files of a release have PEP 658 metadata or none of them. (I may be wrong, but in that case we can adjust it later.)
I see that pypi recently started supporting offering up metadata separately from wheels - and I wouldn't mind betting that it's the only major repository that does so. Kind of funny if pypi is the one repository for which poetry does not try using that metadata. Of course support for pypi can be added later so this is not anything against this MR; it just amuses me. (Using the metadata from pypi could be a fairly big win, I think, for those packages where the JSON API does not give the metadata that poetry wants. The JSON API has the problem that when it reports an empty list of dependencies, it could mean either "this package has no dependencies" or "I don't know the answer"; so poetry has to download and inspect a potentially large distribution file. Being able instead to download a small metadata file should make a meaningful different to resolution time in some cases) |
Yeah, it was already raised on Discord, I am working on adding that. |
aef9975
to
f1f64b1
Compare
5956617
to
682d9b2
Compare
I fixed the parsing of Further, I added PEP 691 support as fallback for PyPI (last commit). We could make it a separate PR, but I think the change is still small enough especially when squashed because there's some refactoring of the previous commits included. |
A couple of doubts.
What I mean is: I expect that on pypi the json API almost always agrees with one of the metadatas. So if poetry has decided that the json data doesn't contain any requires-dist - then likely neither will any of the metadatas, and it is going to have to download a whole distribution anyway. ie separately inspecting the metadata is an extra step that probably won't help. The exceptional case where this can help is for packages that (i) do have dependencies (ii) uploaded their sdist first. (Because this hits the warehouse oddity in which the json api is populated from the first uploaded package). Then the json api will show no requires-dist, but metadata from one of the wheels will fill that gap without downloading the whole distribution. However this case is rare: the popular CI tools and workflows upload wheels first. (dulwich is a counterexample) All of which is I guess a long-winded way of asking whether you have any evidence about the practical effect of this? Is it a lot of extra round trips to pypi for no benefit, or does it save lots of wheel downloads? And since the answer is likely "a bit of both" - how does that balance work out? |
Those are some very good points. I haven't done any performance tests yet but I completely agree that it is necessary.
I overlooked that. Metadata fetching should definitely be lazy.
IIUC the huge difference is that we don't trust the (non-standard) json API if requires-dist is null, but no requires-dist in the metadatas means that there are no dependencies for sure so that we do not have to download the distributions. Or did I misunderstand the PEP? I suppose the current implementation will not download any distributions if the relevant metadatas say that there are no dependencies. |
I think you are right about trusting the metadata when it says "no dependencies" - and I agree that this should be a big win. (We don't trust sdists when their PKG-INFO says "no dependencies" and I had misremembered / confused myself). |
03d9ed6
to
a38f62a
Compare
blocked by #8940 |
c286e84
to
a28cbab
Compare
This should be ready now. I'll probably check again later this week but in case anyone wants to take a look... |
This change allows Poetry to make use of PEP 503 "simple" API repositories that implement PEP 658 for core metadata. Co-authored-by: Bartosz Sokorski <b.sokorski@gmail.com>
a28cbab
to
b1bcbc2
Compare
@Secrus @radoering @dimbleby thanks for taking this over 🎉 |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This change allows Poetry to make use of PEP 503 "simple" API repositories that implement PEP 658 for core metadata.
Requires: python-poetry/poetry-core#333
Relates-to: #4231 (comment)