Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add handling for non-sha hash in lock file and/or repository metadata when choosing a link #5326

Closed
wants to merge 3 commits into from
Closed

Conversation

ITProKyle
Copy link
Contributor

This PR moves existing logic into a new method for reuse, pulls the logic from #4529 forward into 1.2, and adds a condition to handle legacy repositories (Sonatype Nexus) returning md5 hashes in its metadata instead of sha256 like poetry expects.

resolves #4523
resolves #4578
resolves #4085
supersedes #4740

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.

split logic from LegacyRepository._get_release_info for getting sha256 hash from file when link hash is not sha* into its own method and move it to parent class PyPiRepository for reuse
depending on repository type, will attempt to get sha256 hash
this is needed to support lock files created by <1.2 which may contain a non-sha256 hash
Copy link
Member

@abn abn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is in the right direction, I wonder if we should first consider expanding the lockfile to include and/or allow non sha checksums if only md5 is available from non pypi sources. This can be done in interations as well.

):
with temporary_directory() as temp_dir:
filepath = Path(temp_dir) / link.filename
self._download(link.url, str(filepath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might get a bit problematic as it could end up folks downloading all available builds for all dependencies, well n+1. In the interim I reckon best to download to cache and use the downloaded file as is without moving it around. Atleast that will save redownload when relocking or re-solving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the codebase, it looks like I'm going to have to pass poetry.installation.chef.Chef into the repository object or create it within the object to have access to the caching logic. To do so, I would need to know the Env when registering the repos (at a minimum when registering the legacy repos). I'm having trouble locating where I could pull Env from at that point. Any advice?

The other option I can think of would to be pass Chef as an argument to Repository.get_release_info/Repository.package & Chooser.choose_for since all of them would need to have the ability to download and calculate sha256 for a file. But, that seems excessive when it's only needed in cerin circumstances.

if (
link.hash_name
and not link.hash_name.startswith("sha")
and isinstance(repository, PyPiRepository)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider makign this apply broadly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly as in for any repo type? The get_sha_method_from_link method would need to be added to Repository in that case since this check only exists to ensure the existence of the required method. I'm not entirely sure what the logic of the method would need to be for all the other types.

@ITProKyle
Copy link
Contributor Author

I wonder if we should first consider expanding the lockfile to include and/or allow non sha checksums if only md5 is available from non pypi sources

@abn I did a bit of research on this - specifically targeted at the logic affecting my use case. PR #2958 (resolving 2 issues) implemented the logic to calculate sha256 that I moved into the get_sha_method_from_link method. Allowing non-sha checksums would break exporting to requirements.txt for use with pip (unless using --without-hashes) as pip no longer supports using md5, sha1, and sha224 checksums.

Poetry could calculate sha256 on export but given that pip no longer supports these weaker checksums due to security concerns, I would lean towards not wanting poetry to rely on them in a lock file either.

@ljnsn
Copy link

ljnsn commented May 25, 2022

Hi, what's blocking this? Can I help in moving this forward somehow? As far as I can see, without this there is no way to install from private repos that return non-sha hashes with poetry 1.2.x.

@neersighted
Copy link
Member

neersighted commented May 25, 2022

Sorry for the radio silence here -- this issue is not as straightforward as just whitelisting another hash format, as it has security and design implications. We have a large number of issues around this and duplicate PRs (see #5670 for example).

It is agreed on by the team that we need to support this as the Simple Repo spec allows it, and real implementations (e.g. Artifactory) only speak md5. That being said, we need to do it in a way that rejects md5s from sources that should not supply it (e.g. PyPI) and that is smart about when hashes are computed locally and when they are taken from the server, as well as when they are verified (e.g. when reading from the cache).

I think right now, we're likely to accept the PR that checks hashes when installing cached wheels, and we need to create an issue to holistically evaluate how and when we handle hashes. Then an implementation can go forward with guidance and feedback from the core team (or one of us will implement it).

I'm going to try to make sure we round up these issues over the next few days so that we can direct all discussion to the meta-issue. We can also go through different PRs attacking this issue to see if one can form a useful base for the "fix"/redesign once we have a rough idea of the best approach.

@neersighted
Copy link
Member

(also for those reading along at home, note that I have/had a PR at #4740 that stalled out for the same reasons -- I'm still rather interested in fixing this and will likely implement this once we figure out what the design should be)

@ljnsn
Copy link

ljnsn commented May 25, 2022

I understand that this is not trivial and you have to be careful about how you implement it. I assume it's still going to take a while, unfortunately that means we cannot install packages hosted either on Nexus or Artifactory with poetry, as they both return md5 hashes.

For what it's worth, my current workaround is

poetry run pip install --index-url https://private-repo .

Since pip supports installing from pyproject.tml this works, it only installs the main dependencies though, no dev dependencies. I install them with poetry install --only dev, but that of course only works if they're not hosted on the private repo.

@dimbleby
Copy link
Contributor

we cannot install packages hosted either on Nexus or Artifactory with poetry, as they both return md5 hashes.

I don't know about Nexus, but I do know that recent artifactory returns sha256 hashes - per https://www.jfrog.com/jira/browse/RTFACT-18495

(I know this because that upgrade caused its own confusion: artifactory suddenly returning sha256 hashes meant that poetry.lock files needed rebuilding, and local poetry caches needed clearing.)

Of course you may or may not be able to upgrade your artifactory.

@lacer93
Copy link

lacer93 commented May 30, 2022

Older Gemfury repositories also return md5 hashes (I suppose the emphasis is on the older as I created a new one to create a reproducible environment and it now returns sha256 hashes while the old one still returns md5.. but I can't find a setting to change it).

I understand that md5 is not as secure as sha256 and whatnot but it is still used in a lot of places and it is far from being insecure. At this point it would be wise to consider the fact that a lot of people are blocked by this (all of our pipelines are hardcoded to some ancient poetry & poetry-core version..) and are unable to use new poetry features.

Would it be possible to simply add a config value in the .toml file to the private repo's section which would signal to poetry what hashing algorithm to use? That way, PyPI is untouched and the user has the ability (and the responsibility) to control the behaviour.

@lacer93
Copy link

lacer93 commented Jul 15, 2022

@neersighted @abn is there any progress here?

@radoering
Copy link
Member

I'm closing this PR because it seems to be stale and at least partially superseded by other PRs. If you think it still contains something worth reviving, feel free to resolve the merge conflicts and reopen it or just create a new PR.

@radoering radoering closed this Jan 20, 2024
Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
7 participants