Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix poetry secondary source bug #4323

Merged
merged 8 commits into from
Jan 7, 2022

Conversation

isobelhooper
Copy link
Contributor

Fixes #4026 (hopefully).

PyprojectPreparer made sure to get all of the originally-defined Poetry sources (with their original names), and any sources configured based on python-index sources in Dependabot's config (with short random hex strings as names).

However, it then used these to generate poetry.lock without trying to combine them, so packages in private repos would end up with the short random hex string as their reference in poetry.lock, which would not work.

This PR merges any configured sources into the originally-defined Poetry sources in PyprojectPreparer.replace_sources, so that the final sources should have the original name they had in pyproject.toml.


Please let me know if there are any code style or other changes you'd like made to this. Ruby isn't my main language, but I'm happy to change whatever's needed.

…for lock files

...in the case where the [tool.poetry.source] is for a private repository.
…entials in URL

This requires some more work because the authed URL won't match the original URL,
so we have to match on the original URLs and merge in changes.
@jurre
Copy link
Member

jurre commented Oct 15, 2021

Thanks for digging into this! I'm going to let CI run, I'll need to dig into the changes a bit and spotted some style/ruby specific things, but overall looks good from a quick scan

poetry_object["source"] = sources if sources.any?
sources_hash = pyproject_sources.map { |source| [source["url"], source] }.to_h

for source in config_variable_sources(credentials) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat nitty, but using for is discouraged in ruby, each is faster and more idiomatic. Apologies for the nitpick 😬

Suggested change
for source in config_variable_sources(credentials) do
config_variable_sources(credentials).each do |source|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I'll change this as part of fixing what rubocop flags in the next commit.

sources_hash = pyproject_sources.map { |source| [source["url"], source] }.to_h

for source in config_variable_sources(credentials) do
if sources_hash.has_key?(source["original_url"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sources_hash.has_key?(source["original_url"])
if sources_hash.key?(source["original_url"])

I believe has_key is deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also fix this as part of the rubocop fixes. 👍

@jurre
Copy link
Member

jurre commented Oct 19, 2021

This looks great, one thing I want to verify is that this doesn't cause the username/password to be committed as plain text when running this end to end

@isobelhooper
Copy link
Contributor Author

This looks great, one thing I want to verify is that this doesn't cause the username/password to be committed as plain text when running this end to end

Is this something that I should address with more tests, or is it something that you'd have to run yourself? Am happy to make more changes as needed.

@jurre
Copy link
Member

jurre commented Nov 1, 2021

Is this something that I should address with more tests, or is it something that you'd have to run yourself?

@isobelhooper apologies for the slow response on this! It would be great to add a few tests to python/lib/dependabot/python/file_updater/poetry_file_updater.rb, as we call the changed code from there. We want to make sure that the updated files do not contain hard-coded secrets after that code has been called. If you happen to have a project I could manually test this against, that'd also be much appreciated! I can set one up if not though

@isobelhooper
Copy link
Contributor Author

isobelhooper commented Nov 12, 2021

Is this something that I should address with more tests, or is it something that you'd have to run yourself?

@isobelhooper apologies for the slow response on this! It would be great to add a few tests to python/lib/dependabot/python/file_updater/poetry_file_updater.rb, as we call the changed code from there. We want to make sure that the updated files do not contain hard-coded secrets after that code has been called. If you happen to have a project I could manually test this against, that'd also be much appreciated! I can set one up if not though

I'm afraid I don't have a project I could test this against - setting one up would be much appreciated!

I had a try at writing tests for poetry_file_updater.rb using Webmock for the PyPI responses, but ran into the problem that the updater calls out to poetry update and Webmock can't mock out the requests that it makes. A test project would be good for this.

EDIT: However, you reminded me that I could test locally against one of our private repos and then change code to use a different one, which I'll do so I can keep working on this.

This was tested with a local instance of pypiserver with an empty packages/
directory, which meant it was just proxying calls to pypi.org.

The credentials used in testing were set up using the pypiserver instructions at
https://pypi.org/project/pypiserver/#apache-like-authentication-htpasswd.

This might be feasible for Dependabot's own testing, as you could run this
as part of the setup, but there are probably better alternatives.
@isobelhooper
Copy link
Contributor Author

isobelhooper commented Dec 13, 2021

Apologies for the slow response on this - I've pushed another commit with a test in python/lib/dependabot/python/file_updater/poetry_file_updater.rb that will, unfortunately, currently not work.

At the moment, it relies on having a Python repository running at http://localhost:8080 that is set up to allow username / password test / test, and that hosts the luigi package.

I did this using pypiserver and following the instructions in https://pypi.org/project/pypiserver/ to set up a server running on localhost:8080 that serves no local packages and instead redirects all queries to pypi.org (which would then serve the luigi package). My test setup just ran pypiserver as a separate process, but there's a Docker image that could be used to do the same.

The new test should pass once there's a Python repository like this (with auth credentials) that it can access - though I wouldn't expect it to be on localhost, and I'd hope the credentials could be passed in as secrets rather than hardcoded.

Would it be possible for the Dependabot CI to either set up a temporary Python repository like this just for testing, or use an internal private Python repository?

@jurre
Copy link
Member

jurre commented Dec 13, 2021

Would it be possible for the Dependabot CI to either set up a temporary Python repository like this just for testing, or use an internal private Python repository?

I'm not sure, both approaches seem fairly brittle and involved 🤔 is there a way that we could mock out those calls instead?

@isobelhooper
Copy link
Contributor Author

Would it be possible for the Dependabot CI to either set up a temporary Python repository like this just for testing, or use an internal private Python repository?

I'm not sure, both approaches seem fairly brittle and involved 🤔 is there a way that we could mock out those calls instead?

Unfortunately those calls are made as part of the poetry update command that's called by PoetryFileUpdater.updated_lockfile_content_for - we could mock out how Poetry updates lock files, but that also seems brittle and involved.

I'm thinking that it might be possible to move the test to a little earlier in the file updating, and avoid having to make an external call at all.

Since the only thing that updated_lockfile_content_for changes about the lockfile once Poetry has generated it is the hash, my thought is that if we make sure that the project file from PoetryFileUpdater.prepared_pyproject (which is then used to generate the lockfile) doesn't contain any credentials, then any leaking of credentials into the lock file would be a Poetry bug.

I'll have a go at making a test for PoetryFileUpdater.prepared_pyproject in the case where we have auth credentials.

isobelhooper and others added 2 commits December 28, 2021 21:40
...rather than testing against the updated files as in the previous commit.
With this approach, we don't need to set up an actual repository with credentials,
since all we are doing is checking that the credentials do not leak into the
pyproject file.

Since the lock file is created by Poetry itself from the pyproject file,
if our pyproject file is clean of credentials when we generate the lock file,
the only way credentials could then leak in would be a bug in Poetry.
@isobelhooper isobelhooper requested a review from jurre January 7, 2022 09:56
@jurre
Copy link
Member

jurre commented Jan 7, 2022

Thanks for all your work on this!

@jurre jurre merged commit 4a33928 into dependabot:main Jan 7, 2022
@isobelhooper isobelhooper deleted the fix-poetry-secondary-source-bug branch January 7, 2022 12:06
@brrygrdn brrygrdn mentioned this pull request Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Private repository name gets overwritten in poetry.lock update (even when the update is for a PyPI package)
2 participants