Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip should support custom authentication handlers for private pypi #4475

Open
zmt opened this issue May 12, 2017 · 29 comments
Open

pip should support custom authentication handlers for private pypi #4475

zmt opened this issue May 12, 2017 · 29 comments
Labels
C: download About fetching data from PyPI and other sources resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature

Comments

@zmt
Copy link

zmt commented May 12, 2017

  • Pip version: 9.0.1
  • Python version: 2.7.13
  • Operating system: MacOS X Sierra 10.12.4 *

* any OS, really

Description:

This is a feature request.

It would be super-awesome++ if pip supported custom authentication handler configuration so private pypi repositories are not restricted to http basic auth only. Basically, make MultiDomainBasicAuth the default and no longer the ONLY option in a PipSession as it is today: https://github.com/pypa/pip/blob/9.0.1/pip/download.py#L331-L332

This limitation prevents easy integration with stronger authentication (e.g. 2-way TLS, 2FA, etc.) and SSO schemes at enterprises with private pypi repositories. The lack of support makes basic auth credential distribution and leaking unnecessarily difficult problems to address and combat.

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Jun 26, 2017
@pradyunsg pradyunsg added type: feature request Request for a new feature and removed type: enhancement Improvements to functionality labels Oct 24, 2017
@zooba

This comment has been minimized.

@pradyunsg

This comment has been minimized.

@zooba

This comment has been minimized.

@zooba

This comment has been minimized.

@webknjaz

This comment has been minimized.

@zooba

This comment has been minimized.

@zooba

This comment has been minimized.

@zmt
Copy link
Author

zmt commented Jun 28, 2019

I believe that will also work for the original use cases? Keyring has extensible backends, so it would mean installing another package that includes keyring before installing the package. @zmt - thoughts?

I haven't thought about this really since 2017. The addition of keyring support is great, but doesn't appear to help with SSO or using ssh certificates from an ssh-agent, which were 2 authentication methods I initially had in mind back then.

@schlamar
Copy link
Contributor

schlamar commented Mar 3, 2020

Wouldn't it be the best solution if pip just allows to provide a custom requests.auth.AuthBase? There are already a few useful auth implementations for requests like OpenId and Kerberos, see https://requests.readthedocs.io/en/master/user/authentication/

@fedorbirjukov
Copy link

Is anyone already working on a solution?

@uranusjr
Copy link
Member

@fedorbirjukov If anyone is actually seeking a solution, they are not doing it publicly :) Go head and work on it!

I don’t think it’s a good idea for pip to provide direct access to requests.auth.AuthBase; pip using Requests usage should be treated as an implementation detail. An intermidiate abstraction would be needed.

@fedorbirjukov
Copy link

fedorbirjukov commented Apr 12, 2020

I created PR #8029 based on PR #3731. Fingers crossed.
UPDATE: closed it after having a closer look.

@amancevice
Copy link

I also opened a PR #8030 that is related to this.

My change adds a --extra-headers option to pip commands that enhances the PipSession object with arbitrary headers so you can do things like token-based authentication.

E.g.:

pip install \
  --extra-headers='{"Authorization": "..."}' \
  --index-url https://secure.pypi.example.com/simple \
  --trusted-host secure.pypi.example.com \
  fizz==1.2.3

@uranusjr
Copy link
Member

uranusjr commented Apr 29, 2020

I’ve cleaned up the previous comments a bit to focus this thread on the remaining this at hand: implementing a way to plug in custom authentication backends, to support using methods such as Kerberos (#6708) and Windows Integrated Authentication (#8163).

The solution will likely be some kind of a plug-in system, so a user can install a backend alongside with pip, and use a flag to tell pip to use that. So the next questions from what I can tell would be to a) come up with a design, and b) identify places that need to be pluggable. I’m marking this as deferred till PR since some actual code would likely be the easiest way to kick off the discussion.

@uranusjr uranusjr added the resolution: deferred till PR Further discussion will happen when a PR is made label Apr 29, 2020
@ghost
Copy link

ghost commented May 20, 2020

I honestly think pip should look to git-remote-helper as a model for a possible solution here. Example usage could simply be something like this:

$ pip install my-private-package --extra-index-url s3://my_private_pypi_bucket/

When the "scheme" of the repository URL (s3 in this case) is unknown to pip, it tries to start a subprocess named something like pip-remote-s3, whose executable would be located on the PATH due to the installation of some 3rd party helper. It then sends "commands" to the subprocess via stdin, much like git-remote-helper.

You could allow others to implement whatever custom auth mechanisms they like via one of these helpers, and users need to simply install said helper onto the PATH, then use the helper's corresponding scheme in the index URL. To be honest this isn't even custom authentication support per se, but more custom protocol support which would allow whatever authentication mechanism you'd like. pip install via SFTP? No problem!

I don't know exactly what the protocol between pip and the helper would look like, or what layer of abstraction it should lie on. Should the helper simply send PEP 503-style responses to stdout? Should we allow the helper to ask input from the user directly during pip commands? Should CLI options be passed from the pip command (something like --<scheme>-helper-options), or should we limit helper configuration to its own devices, config files and the like? Just some thoughts, would like to discuss.

If we choose to go down this path I'd be happy to have a stab at a PR for it. I'm not familiar with pip's internals but I'd like to get involved.

@fedorbirjukov
Copy link

@tharradine Good point. I've never used git-remote-helper, at least consciously. But its model seems to allow integrating completely different technologies.

I used git on Windows though. And Git has out-of-the-box Windows support, called schannel (Secure Channel). And that's what I'd like pip to have, too. But pip devs are reluctant to go down that road.

@di
Copy link
Member

di commented Jun 9, 2020

The twine project has a similar feature request: pypa/twine#362

@uranusjr
Copy link
Member

uranusjr commented Jun 9, 2020

I wonder if this is a good candidate for a fundable packaging project. Both pip and twine use requests internally, so it might be a good idea to build an entrypoints-based plugin system that can be used by both. I expect corporations would be the main users as well, so it makes sense to ask them for resources.

@schlamar
Copy link
Contributor

As already mentioned above (maybe too vague), requests already supports custom authentication handlers so you don't need some complicated process communication protocol: https://requests.readthedocs.io/en/master/user/authentication/

So in theory the user just have to configure a factory creating such an authentication (for example an auth.py file in the pip config folder returning a requests_ntlm.HttpNtlmAuth). Pip creates an instance and passes it to requests.

That would be a really simple solution and has the benefit, that existing requests auth handlers can be used without modification.

@uranusjr
Copy link
Member

We can theorise all day, but ultimately someone still needs to put in time and effort to write the code. Which is where funding comes into play.

@schlamar
Copy link
Contributor

I would expect that organizing funding for my proposal would take more time than implementing the solution...

@ghost
Copy link

ghost commented Jun 10, 2020

We can theorise all day

That's kind of the point of these issues is it not? Funding is not a prerequisite to discussing design ideas, it is not even a prerequisite to an implementation - I've offered my time in a previous comment

@schlamar
Copy link
Contributor

If someone is willing to help with the configuration part in pip I can make a PoC.

I would propose something like PIP_AUTH_FACTORY/--auth-factory which should point to a Python file. This Python file has an auth function (or other callable) returning an requests.auth.AuthBase.

For example:

from requests_ntlm import HttpNtlmAuth

def auth():
    return HttpNtlmAuth('domain\\username', 'password') 

@ghost
Copy link

ghost commented Jun 10, 2020

@schlamar I agree that a requests auth handler is a simple solution to the use case of authenticating to a PEP 503 repository over HTTPS. For many users I'm sure that is all they need.

Unfortunately I'm a bit more ambitious and would like a plugin system to not require the use of any specific transport or application protocol, not require the package repository to adhere to PEP 503.

Expanding on my S3 example above - I could have a simple repository being hosted simply on an S3 bucket - no custom HTTP endpoints whatsoever, no HTML files, all that's required is some pip-remote-s3 client-side script, which knows how to discover the dists. The subprocess communication protocol need not be "complicated" - in fact it can be even simpler than PEP 503's "Simple Repository API".

@schlamar
Copy link
Contributor

@tharradine I see. However, I think this should be discussed in a separate issue (support for custom protocols instead of custom authentication handlers).

@ghost
Copy link

ghost commented Jun 10, 2020

@schlamar That's fair enough, I suppose the two concepts are not mutually exclusive and both solutions could well be accepted.

@pfmoore
Copy link
Member

pfmoore commented Jun 10, 2020

Things I'd want to see in any concrete proposal to handle this:

  1. A means whereby it's user-expandable, so that tools like pip don't need to add new code every time someone comes up with a new protocol/handler/whatever.
  2. A way of addressing the bootstrapping issue (user can't install the handler because they need pip to do so, and pip can't install without the handler).
  3. A reusable solution that will work across PyPA tools, so we can avoid having to implement the same feature (possibly with annoying subtle differences) in pip and twine and ...
  4. A clarification of how this fits with the fact that pip has no supported programming API, so any sort of plugin cannot rely on anything about pip's internals remaining constant. (As a practical example, what if we decided to switch from requests to httpx for our network protocol? It's not impossible that we would do this...)
  5. Good documentation and tests for all of the above.

Reasons I think these are important:

  1. These same points come up every time we discuss issues like this. For example, the bootstrapping issue came up with the keyring implementation, and wasn't completely addressed there, so that feature is less useful to some people than it might otherwise be. Let's not repeat that.
  2. Design issues like this are much harder than "just writing the code", and result in maintenance issues longer term if we just accept a PR without considering them.
  3. The interactions between new features for pip and existing features have the potential to become very complex very quickly, and generally when a PR is developed with a focus on just addressing the initial use-case, these interactions are not noticed until after the PR has landed (and often, not until people have started relying on details of the interactions which weren't ever intended). Again, that can be a maintenance issue, making refactoring of pip's code base way harder than we can deal with.
  4. Test infrastructure for this sort of environment generally doesn't exist in open source CI offerings, so it's really hard to ensure adequate testing.

It's really hard to thrash out this sort of "wider issue" in the context of an open source issue tracker/pull request workflow. That's where a funded project, with a clear scope and a remit to look at the broad implications, is a potential way forward for proposals like this. And where the use case is specifically around "corporate" infrastructure like private repositories, some sort of funding can help bridge the gap between volunteer resources who have no "itch to scratch" in this area, and businesses that depend on such support but don't otherwise have a means to influence what features get accepted.

Remember, the pip developer team consists of a very small number of wholly volunteer contributors. We're working on trying to make things more sustainable, but in the meantime we have to be careful how we manage feature additions. Funded developments is one way we're exploring of doing this.

(And yes, I understand that the above makes something that "seems simple" into quite a big project. I don't apologise for that - changes to pip can have a huge impact, and we owe it to all of our users to do our best to ensure they are well managed).

@pradyunsg
Copy link
Member

I imagine most of the folks interested in this are operating in a corporate setting, with infrastructure set up for running an internal PyPI.

That's a good audience to point to the fact that the PSF's Packaging WG has this listed as a fundable project: https://github.com/psf/fundable-packaging-improvements/blob/master/FUNDABLES.md#architecture-to-support-alternative-authentication-methods-in-packaging-tools

Please contact the Packaging WG by emailing packaging-wg@python.org to ask us to estimate how much one of these improvements would cost; we'll get back to you within a few business days.

@jpedrick
Copy link

jpedrick commented Jun 12, 2023

I made an attempt at resolving this with minimal changes to pip itself: 0205e2e

@pfmoore I'd love your feedback as to whether you think this would resolve the requirements you listed here.

My hope for this is that users would be able to supply completely custom authentication headers for AWS S3 or, say, Kerberos authentication over HTTP. All the implementation details would be up to the auth override module developer.

The basic assumption in my initial implementation about pip internals is that there will be a module with an "AuthBase" class to implement. This isn't strictly necessary, as it would also be possible to define class with __call__ "hook" supplied to MultiDomainBasicAuth which gets a first look at the request URL and returns 'None' if it's uninterested in the URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: download About fetching data from PyPI and other sources resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests