-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add handling for non-sha hash in lock file and/or repository metadata when choosing a link #5326
add handling for non-sha hash in lock file and/or repository metadata when choosing a link #5326
Conversation
split logic from LegacyRepository._get_release_info for getting sha256 hash from file when link hash is not sha* into its own method and move it to parent class PyPiRepository for reuse
depending on repository type, will attempt to get sha256 hash
this is needed to support lock files created by <1.2 which may contain a non-sha256 hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is in the right direction, I wonder if we should first consider expanding the lockfile to include and/or allow non sha checksums if only md5 is available from non pypi sources. This can be done in interations as well.
): | ||
with temporary_directory() as temp_dir: | ||
filepath = Path(temp_dir) / link.filename | ||
self._download(link.url, str(filepath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might get a bit problematic as it could end up folks downloading all available builds for all dependencies, well n+1. In the interim I reckon best to download to cache and use the downloaded file as is without moving it around. Atleast that will save redownload when relocking or re-solving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing the codebase, it looks like I'm going to have to pass poetry.installation.chef.Chef
into the repository object or create it within the object to have access to the caching logic. To do so, I would need to know the Env
when registering the repos (at a minimum when registering the legacy repos). I'm having trouble locating where I could pull Env
from at that point. Any advice?
The other option I can think of would to be pass Chef
as an argument to Repository.get_release_info
/Repository.package
& Chooser.choose_for
since all of them would need to have the ability to download and calculate sha256 for a file. But, that seems excessive when it's only needed in cerin circumstances.
if ( | ||
link.hash_name | ||
and not link.hash_name.startswith("sha") | ||
and isinstance(repository, PyPiRepository) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider makign this apply broadly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly as in for any repo type? The get_sha_method_from_link
method would need to be added to Repository
in that case since this check only exists to ensure the existence of the required method. I'm not entirely sure what the logic of the method would need to be for all the other types.
@abn I did a bit of research on this - specifically targeted at the logic affecting my use case. PR #2958 (resolving 2 issues) implemented the logic to calculate sha256 that I moved into the Poetry could calculate sha256 on export but given that pip no longer supports these weaker checksums due to security concerns, I would lean towards not wanting poetry to rely on them in a lock file either. |
Hi, what's blocking this? Can I help in moving this forward somehow? As far as I can see, without this there is no way to install from private repos that return non-sha hashes with poetry 1.2.x. |
Sorry for the radio silence here -- this issue is not as straightforward as just whitelisting another hash format, as it has security and design implications. We have a large number of issues around this and duplicate PRs (see #5670 for example). It is agreed on by the team that we need to support this as the Simple Repo spec allows it, and real implementations (e.g. Artifactory) only speak md5. That being said, we need to do it in a way that rejects md5s from sources that should not supply it (e.g. PyPI) and that is smart about when hashes are computed locally and when they are taken from the server, as well as when they are verified (e.g. when reading from the cache). I think right now, we're likely to accept the PR that checks hashes when installing cached wheels, and we need to create an issue to holistically evaluate how and when we handle hashes. Then an implementation can go forward with guidance and feedback from the core team (or one of us will implement it). I'm going to try to make sure we round up these issues over the next few days so that we can direct all discussion to the meta-issue. We can also go through different PRs attacking this issue to see if one can form a useful base for the "fix"/redesign once we have a rough idea of the best approach. |
(also for those reading along at home, note that I have/had a PR at #4740 that stalled out for the same reasons -- I'm still rather interested in fixing this and will likely implement this once we figure out what the design should be) |
I understand that this is not trivial and you have to be careful about how you implement it. I assume it's still going to take a while, unfortunately that means we cannot install packages hosted either on Nexus or Artifactory with poetry, as they both return md5 hashes. For what it's worth, my current workaround is
Since pip supports installing from pyproject.tml this works, it only installs the main dependencies though, no dev dependencies. I install them with |
I don't know about Nexus, but I do know that recent artifactory returns sha256 hashes - per https://www.jfrog.com/jira/browse/RTFACT-18495 (I know this because that upgrade caused its own confusion: artifactory suddenly returning sha256 hashes meant that Of course you may or may not be able to upgrade your artifactory. |
Older Gemfury repositories also return md5 hashes (I suppose the emphasis is on the older as I created a new one to create a reproducible environment and it now returns sha256 hashes while the old one still returns md5.. but I can't find a setting to change it). I understand that md5 is not as secure as sha256 and whatnot but it is still used in a lot of places and it is far from being insecure. At this point it would be wise to consider the fact that a lot of people are blocked by this (all of our pipelines are hardcoded to some ancient poetry & poetry-core version..) and are unable to use new poetry features. Would it be possible to simply add a config value in the .toml file to the private repo's section which would signal to poetry what hashing algorithm to use? That way, PyPI is untouched and the user has the ability (and the responsibility) to control the behaviour. |
@neersighted @abn is there any progress here? |
I'm closing this PR because it seems to be stale and at least partially superseded by other PRs. If you think it still contains something worth reviving, feel free to resolve the merge conflicts and reopen it or just create a new PR. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This PR moves existing logic into a new method for reuse, pulls the logic from #4529 forward into 1.2, and adds a condition to handle legacy repositories (Sonatype Nexus) returning md5 hashes in its metadata instead of sha256 like poetry expects.
resolves #4523
resolves #4578
resolves #4085
supersedes #4740
Pull Request Check List