Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private registry authentication #2719

Closed
wants to merge 3 commits into from
Closed

Conversation

jdemilledt
Copy link

@jdemilledt jdemilledt commented Jul 5, 2019

@Centril Centril added T-cargo Relevant to the Cargo team, which will review and decide on the RFC. A-registry Proposals relating to the registries of cargo. labels Jul 5, 2019
@RustyYato
Copy link

RustyYato commented Jul 5, 2019

Please add a link to your doc edit: now done

@jdemilledt
Copy link
Author

The link has been added.

@sfackler
Copy link
Member

sfackler commented Jul 6, 2019

The registry decides whether auth is required to the individual endpoints, so it seems like this configuration should live in the config.json file: https://github.com/rust-lang/crates.io-index/blob/master/config.json

@jdemilledt
Copy link
Author

@sfackler I did consider that at first, but the registry may require authentication to pull the index. Perhaps a special index is returned that only says use auth and where to get the token?

@sfackler
Copy link
Member

sfackler commented Jul 6, 2019

git authentication is handled totally separately from HTTP authentication with the registry API.

@jdemilledt
Copy link
Author

@sfackler That's a good point to bring up. I think I'm leaning toward the empty index telling Cargo where to get a token then. That way, Cargo can automatically get the token and store it, using it for subsequent requests to the index and registry.

@jdemilledt
Copy link
Author

Could someone point me to the location in the codebase where indices are pulled?

@jdemilledt
Copy link
Author

After some thinking, Cargo shouldn't muck with Git's configuration. The registry should provide the user with instructions on how to fill out the Git authentication requests.

@jdemilledt
Copy link
Author

I've updated the RFC. Please let me know what you all think.

@gilescope
Copy link

Is this going to support ntlm auth? (There's a lot of enterprise windows shops out there.)

@jdemilledt
Copy link
Author

@gilescope Not at the moment. Authentication types are independent from this RFC.

@paddycarey
Copy link

One thing that jumps out immediately from this proposal is that it adds a requirement that private registries must support the web API to enable downloads of private crates from protected URLs.

Currently it's possible to serve a private index without implementing the web API at all (we do this at Cloudsmith, we have our own CLI for uploading and otherwise interacting with published crates) but piggybacking on cargo login requires standing up a whole separate API just to retrieve credentials. It also adds hurdles for those implementing private registries by hand with only a git repo.

It'd be great if the auth could be read from the index config itself, avoiding the need for a double login (once for git, the second for the web api).

Currently at Cloudsmith we generate an index per user, and the base dl URL returned in the config contains a special token that is unique to that user. This allows us to authenticate downloads using an in-URL token which mostly works fine, though is a little limiting.

For our private repos, the index config when checked out currently looks like so:

{
    "dl": "https://dl.cloudsmith.io/MY_TOKEN/MY_ORG/MY_REPO/cargo/{crate}-{version}.crate"
}

What I think would work well in our case would be allowing for a server-set config value that simply gets attached to the Authorization header for HTTP requests. It'd be flexible enough for what we need and extensible enough to support any type of HTTP auth we'd want to use (basic, oauth, API key, etc). Something like:

{
    "dl": "https://dl.cloudsmith.io/MY_ORG/MY_REPO/cargo/{crate}-{version}.crate",
    "auth_header": "Basic XXXXXXXXXXXXXX"
}

(where auth_header could also be something like Token XXXXXXX or Bearer XXXXXXXX, whatever)

I think cargo login is still a useful mechanism when the registry can't return the auth for whatever reason (perhaps it's a public repo on github, but requires auth for actual downloads), but for our use case it feels like unnecessary overhead.

Perhaps cargo could even use this auth_header value for credentials and avoid the need to login at all for this registry if the value is present, might be a nice UX win (though i realise this may be a separate discussion).

FWIW we do plan to implement the web API, but it's not currently a requirement, and this RFC would make it so in many cases where it isn't currently.

@jdemilledt
Copy link
Author

@paddycarey Correct me if I misunderstood what you said.

The whole point of this RFC is to enable Cargo to use private registries without any strange hacks, enabling a fully featured native experience. Private registries the way Cloudsmith does them will still be possible, but there will be a way to do it in a way that integrates well with Cargo if this RFC is approved.

@jdemilledt
Copy link
Author

Also, one can edit their config file to set the token if they are not able to use a web API.

@paddycarey
Copy link

The whole point of this RFC is to enable Cargo to use private registries without any strange hacks, enabling a fully featured native experience.

Absolutely, I'm 100% on board with this goal.

I guess the difficulty is that because the index is served over Git, users will inevitably have to step outside the confines of Cargo to configure access and credentials there anyway, and immediately the experience is less "native" than would be ideal.

Private registries the way Cloudsmith does them will still be possible, but there will be a way to do it in a way that integrates well with Cargo if this RFC is approved.

Noted. I think if this RFC were implemented as is we probably wouldn't change the way we generate URL-based tokens as it only adds an extra configuration step/hurdle for our users. For me this feels like it would be a regression.

@paddycarey
Copy link

I think my main issue with the RFC as it stands now is that if implemented it results in users being required to provide their credentials twice, in two different ways just to download packages from a private registry. Once for git (to clone the index) and once again for the API (to download the packages).

Package consumers don't typically have to do this today (authors do, but I consider that a very different use case).

I understand however it would at least be consistent with how registry authentication currently works, even if I personally think it would be a less than ideal UX.

@jdemilledt
Copy link
Author

It would definitely be less than ideal UX, but the process for cloning the Git repos is much more complicated than I would like to mess with.

@jdemilledt
Copy link
Author

@paddycarey To elaborate on my last comment, the cloning of Git repos is handed off to the git2 library at that point, and imo more confusion would be introduced by trying to inject credentials there.

@paddycarey
Copy link

Agreed, I'm not suggesting that Cargo should control authentication with git, that'd make things quite complex and confusing.

My suggestion above was mostly just an idea for how the index's config.json might contain an auth token which Cargo then uses to authenticate subsequent download requests, thus saving the additional step of having to do a cargo login after already configuring credentials and cloning the index.

I guess this is more or less a formalised mechanism for what we already do, just that the token is included explicitly in the index's config.json and then sent as a header, rather than embedded inline into the download URL.

Perhaps this is a seperate discussion though. This issue is about ensuring Cargo sends the token for all requests when it has one, and I'm mostly thinking about how that token is provided to Cargo in the first place.

"dl": "https://my-intranet.local/api/v1/crates",
"api": "https://my-intranet.local/",
"auth_required": true,
"token_url": "https://my-intranet.local/me"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo already assumes a /me URL for this purpose, what's the benefit of allowing it to be redefined here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for customization in case the API for this registry server is different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add, this defaults to /me if not specified.

@jdemilledt
Copy link
Author

@paddycarey Getting the token from the Git index assumes a unique index is made for each user, which may not always be the case. The current plan isn't ideal, but imo it's the least amount of confusing behavior.

[prior-art]: #prior-art

Many other languages have package managers with support for private
repositories. Not all will be listed here, but a select few as examples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use more info here:

  • how do etch the other package managers do authentication?
  • how are the specifics different?
  • why did they do it differently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of those examples, I know that Pip, Maven, and Composer use HTTP Basic, and NPM uses some not very well documented token mechanism.

@nrc
Copy link
Member

nrc commented Sep 16, 2019

@paddycarey it sounds like this RFC wouldn't be directly useful to you, is that right? Do you know of a different approach that would be useful? In particular, any small steps towards a general solution where the small step is also useful.

@jdemilledt Do you have a specific use case you are trying to address with this RFC? Or are you trying to get us incrementally closer to a vision for private registries in the long term?

@jdemilledt
Copy link
Author

@nrc I do have a specific use case for this RFC. My company has internal projects we would like to be able to publish to our own registry instead of needing to do weird Git things.

@sgrif
Copy link
Contributor

sgrif commented Sep 26, 2019

This is my opinion as an individual. The details of this RFC haven't been discussed among the crates.io team, and nothing here should be taken as an indication of any consensus among that team.

This definitely feels like it's too constrained to me. Sending an API key for all requests seems like the wrong knob to add. I think it is prudent to ensure we are only sending them when strictly necessary. Even if this were per endpoint, it still doesn't specify how authentication should happen. As an example, we're looking into adding 2fa for publishes to crates.io.

In some cases, it's not even possible to know if/what kind of authentication would be needed for a given endpoint ahead of time. Going back to 2fa as an example, a user may or may not have 2fa enabled. If it is enabled, we may only require it once every few minutes at most. So ultimately Cargo is just going to have to try to send the request, and when it gets back a 401, it will retry the request with the specified authentication method. It seems like that could be used for the use case targeted by this RFC as well.

@jdemilledt
Copy link
Author

@sgrif That's a good point. With 401 though, the server needs to return a valid authn challenge, which I don't know if the current crates.io impl does.

@paddycarey
Copy link

@nrc

it sounds like this RFC wouldn't be directly useful to you, is that right?

right, it doesn't add much for our use case (but we'll support whatever is agreed upon, in any case)

Do you know of a different approach that would be useful?

I think the approach outlined by @sgrif above should work fine for us, i.e.:

So ultimately Cargo is just going to have to try to send the request, and when it gets back a 401, it will retry the request with the specified authentication method. It seems like that could be used for the use case targeted by this RFC as well.

Depending on the details this should work just fine for our users.

We have a solution that works for us right now (auth details embedded in the download URLs), which while not ideal is fine for us for now. So we're not in any rush for a quick solution, we'd rather wait for the right one.

@kestred
Copy link

kestred commented Nov 22, 2019

I'm in the process of implementing a "private registry" server for my org and would prefer something like this RFC to be implemented. It's acceptable (even desirable) as is, but could be even more minimal in ways that are not incompatible with other suggested authentication changes.

@nrc, @sgrif, @paddycarey:

In the implementation, I've landed on using a similar url-includes-a-token technique as is used by Cloudsmith, but would prefer a method which allows download and search to use the same authentication mechanism as the rest of the Web API (regardless of potential future changes/improvements/extensions to authentication).

I believe having Cargo use the same method as Web API authentication (by default) will reduce how much work future implementors of cargo registries need to do, even if more advanced authentication configuration options are later introduced.

A More Minimal "RFC"?

  1. Send the current authorization header in download and search requests, subject to the presence of some configuration key. Specifically, define these to use the same authentication methods (by default) as whatever the Web API uses, when auth is enabled.

  2. For now, don't allow /me to be configured (it can be changed in a different future RFC that address wider authentication or API configurability).

  3. Don't specifically use the /me URL when auth is enabled, instead with a message to the effect:

    The registry foo requires authentication when downloading crate blah.
    You may need to cargo login or otherwise authenticate.

    (or alternatively, just use /me with a fallback if the content-type is unexpected or the request fails)

    This behavior could also be changed in a future RFC, to for example make a request to /me,
    or fetch registry configuration from git as has been suggested by @paddycarey.
    Regardless, having it emit a generic message for now seems safe and friendly.

Bikeshedding

Alternatives to auth_required that may imply any of less, more, or different semantics:

  • [good enough] "auth_required": true - as is; the other suggestions provided in the hope that they may address specific pieces of feedback provided above
  • [good] "private": true - avoids describing "auth", instead suggesting crates cannot be accessed directly; could have future benefits (like allowing private crates to be published to private registries, although that is out of the scope of my comments and not necessarily something I'm recommending)
  • [better?] "authorize_fetch": true (or authorize_get) - is more specific in authorizing download and search requests, hopefully partially addressing @sgrif's concerns.
  • [less good] "authorize_dl" and authorize_search" - even more specific, though I think entirely unnecessarily-so and may induce more complication in future RFCs that desire to change how cargo can authenticate.

@tustvold
Copy link

tustvold commented Feb 8, 2020

An alternative, but potentially more controversial idea might be to allow overriding the dl URL within the .cargo/config file. This would allow specifying basic auth credentials there if necessary.

For example

my-registry = { index = "https://github.com/<COMPANY>/crates/git/index", crates = "https://<USER>:<PASSWORD>@my-amazing-s3-proxy.com/crates" }

As an added bonus this would also open the door to running cargo repository mirrors/caches that only replicate the binary assets, which can be done with a multitude of existing tools, without having to also mirror the git-based index, which cannot.

This wouldn't, however, address authentication for the web API endpoints, but I think that is possibly a separate concern given its optional nature, and the fact it is a separate API that is highly coupled with what crates.io does.

@secana
Copy link

secana commented Apr 28, 2021

If you want to try out authorized download request with Cargo check out this experimental PR Add authorization. Currently it mirrors how a package push works and just adds the HTTP Authorization header and a token. Where the token comes from, is up to the user.

I've implemented the experimental HTTP registry specification and the experimental authorization in Kellnr (registry by Bitfalter that implements the full private API spec) and it works so far, but only for the HTTP registry, not for the git registry.

I'm not sure, if we have to think about authorization with the git index at all. For me, the way for private registries is the HTTP API and crates.io does not need authorization on the index. IMHO a private registry can have the requirement of a specific cargo version to work properly, e.g. one with HTTP registry and authorization support. But I'm not sure if there is anybody with a need for an authorized git index access, if the HTTP API supports it and can fully replace the git index approach.

However, I'm happy to help implement any solution that is agreed on!

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 5, 2021

The conversation has moved to #3139 . So I am going to close this thread. Thank you all for your thoughts and contributions!

@Eh2406 Eh2406 closed this Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-registry Proposals relating to the registries of cargo. T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.