Skip to content

authentication skeleton #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 20, 2021
Merged

authentication skeleton #314

merged 26 commits into from
Oct 20, 2021

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Oct 15, 2021

In preparation for prototyping authorization, I found I needed some basic kind of authentication in Nexus. This change includes a small (albeit overengineered) authentication subsystem for this. It doesn't support any useful authentication yet, but it has enough to test itself and to get started with authorization.

This change is kind of big. I'm happy to do an interactive walk-through if that's helpful.

Here's my long-term thinking on this:

  • Eventually, every operation in Nexus will have an associated authentication/authorization context, represented with some Rust object that describes (one way or another) who (or what) is authenticated and what privileges they have. By "operation", I mean basically anything at all, not just external API requests. This would also include handlers for internal API requests and background activities like saga recovery.
  • These different contexts might have different ways of getting an authn/authz context. External API requests would use HTTP signatures or OAuth. Internal API requests might use macaroons? Background operations might get some magic creds or maybe have to attest to the RoT or something to get real creds. But in the end, they all wind up with a similar-looking authn/authz context object.
  • For external authn specifically, we'll probably support a couple of different kinds of authentication (we've talked about HTTP signatures and OAuth).
  • We should make it hard to write code that "forgets" to authenticate -- i.e., you shouldn't be able to write something that just queries the database or makes requests out to Sled Agent without having done some kind of authn/authz. So I think eventually we'd have all these operations require this authn/authz context object.

This change only gets us started. It includes:

  • Generic types for an authentication context (authn::Context), authentication errors, and related types
  • An "Authenticator" for the external API (authn::external::Authenticator) that's pluggable using a pretty simple trait (authn::external::HttpAuthnScheme). (I'm using the word "scheme" as a sort of generalization of HTTP authentication schemes. A "scheme" is any way of producing an authentication context, usually by looking at an incoming request and verifying some credentials.)
  • An implementation of that trait with a very simple authentication scheme called "spoof". This scheme reads the "oxide-authn-spoof" HTTP header and blindly trusts that the client is who they say they are. This mechanism is enough to test the authentication system itself and should be enough to get started with authorization as well. Obviously this should never be configured in a real system. You'd have to go out of your way to do that today. I'd put in more safeguards, but there's no place to put them because even after this change there's no real authn or authz happening in Nexus.
  • A new top-level configuration property called "authn_schemes_external". This is an array of authn schemes to apply to external API requests. We try them in order and stop when one reports success or an explicit failure. The stock configuration files use an empty array, meaning they'll report every request as unauthenticated. You could put "spoof" in the config if you wanted to try that out.
  • Some metadata for debuggability, so that whether authentication succeeds or fails, you can tell which schemes were attempted. This is not exposed to the end user for the usual security reasons, but it does show up in the log.
  • Pretty comprehensive tests for all of this

This change does not do a whole bunch of other things we'll eventually need to do:

  • It doesn't actually add authentication anywhere in Nexus! Nexus has an Authenticator, and any API request can easily authenticate (there's an example in an integration test), but none of them does. Without authorization, there's not much point. Down the road, we can implement real authn schemes for the external API, internal API, and other operations.
  • It doesn't modify any low-level operations to require the authn/authz context. We'll eventually want to do this so that you can't do anything useful in Nexus without having authenticated and authorized.
  • It doesn't provide the WWW-Authenticate header. I expect we'll eventually want to provide this on every response, since in principle responses can change based on who's authenticated. This will probably need to be constructed by the Authenticator based on what's configured. I want to punt on this for now.
  • It doesn't include anything about authn in the OpenAPI schema. I want to punt on this for now.
  • It doesn't do anything for authorization. That's what I plan to start next.

@david-crespo
Copy link
Contributor

This is great. Really urgently needed. Still going through it in detail.

},
}
} else {
SchemeResult::NotRequested
Copy link
Contributor

@david-crespo david-crespo Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand NotRequested. Here it's being returned when there is no header at all. In a real world case that might be something like: you can't fail HTTP auth if there's no auth header. Is that right? But whether that's a failure might be contextual. If an endpoint expect auth to happen by that method, then that header being missing should be a failure, right? Maybe that's not a thing we need.

The name makes me also think about calling code trying to skip a scheme, for example if a given endpoint is only allowed to auth by certain schemes. In that case it would be the endpoint that would "not request" auth by this scheme. Is that a possibility with the current setup? From what I can tell that would require different endpoints having their own Authenticator, since that's where the list of allowed schemes is collected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand NotRequested. Here it's being returned when there is no header at all. In a real world case that might be something like: you can't fail HTTP auth if there's no auth header. Is that right? But whether that's a failure might be contextual. If an endpoint expect auth to happen by that method, then that header being missing should be a failure, right? Maybe that's not a thing we need.

My assumption is that an endpoint doesn't really care how you're authenticated -- it cares about what privileges you have (authorization). Basically, the way I expect this will work is: endpoints will use the authn system to get back "Authenticated" or "Unauthenticated", then they will do privilege checks like "can this Actor read information about this Project?" And for unauthenticated contexts, the answer will usually be "no". But there are some operations that might still succeed. This might not be that important for us but I could see allowing unauthenticated users to list the supported versions of the API, for example, or maybe fetch metadata about publicly-available images.

The name makes me also think about calling code trying to skip a scheme, for example if a given endpoint is only allowed to auth by certain schemes. In that case it would be the endpoint that would "not request" auth by this scheme. Is that a possibility with the current setup? From what I can tell that would require different endpoints having their own Authenticator, since that's where the list of allowed schemes is collected.

That's right. Like I said I was assuming endpoints don't care how you're authenticated. NotRequested is never exposed outside the little authn subsystem -- it's only if all schemes return that, then we return Unauthenticated.

Is there a use case you're thinking of where the endpoint would care how you authenticated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that comes to mind would be an endpoint that needs a particular auth scheme from the client so it can make another request using that same auth. But in that case

a) the allowed schemes would have to consist of only that scheme
b) it might make more sense to put the auth info in the request payload if the request needs to use it
c) rather than pulling the needed auth info from the request, once the user is authenticated, the endpoint could pull the needed info from somewhere else

But most importantly no, I can’t think of a concrete use case for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought I was meandering toward was this: if the logic that produces NotRequested is part of the scheme, then if you want to use the same general “scheme” in two places with different NotRequested conditions, then you’d have to make two Scheme impls (maybe with some shared code). But that’s perfectly fine.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davepacheco , it's great to see you start tackling this! It all looks very reasonable to me. I expect we can iterate as needed without much problem given how generic this is.

/// Client successfully authenticated
Authenticated(Details),
/// Client did not attempt to authenticate
Unauthenticated,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not useful to provide the Actor for an unauthenticated request as well? I'm thinking of things like logging which actor fails the most. This could be an attacker going after a specific actor for some reason.

Or am I just off base, because failed requests wouldn't know the actor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if the requested is unauthenticated, then the user didn't even try, so there is no actor.

If the user attempts to authenticate and fails, that produces an authn::Error rather than an authn::Context. That could include the actor for the variants of authn::Reason that have an actor (which won't be all of them) and we could log that.

Of course, there are other ways to model this: you could get back an Unauthenticated context for this rather than an Error, but I thought that would make it way too easy to accidentally proceed with a request that failed authn.

// TODO-security Does this leak too much information, to say that
// the header itself was malformed? It doesn't feel like it, and as
// a user it's _really_ helpful to know if you've just, like,
// encoded it wrong.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this with a grain of salt, as I could be completely wrong here with regards to security analysis.

If the format is already well-defined because callers must know how to make proper requests, then there is no problem as the information is already public. In this case the only time the format will be off is a programmer error, in which case we want to return this so the user can correct their issues, or an automated attack script that is trying to find the right format. If our system is actually secure the latter should not be a problem, since getting the format correct doesn't trigger authentication.

Now, with that being said, there's possibly some side channel (timing?) exposed from this, but I'm skeptical it's a real problem. By this point you are already returning an error, so any additional information would have to match to something discernable in the side channel output. As an example, if a different format error for some reason took longer to reply with then another this could be an attack vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my take as well (that this information is already public -- we're not leaking any new information about the implementation).

I figured I'd leave this in because at some point I expect we'll do a deeper security audit and we'll want to look more carefully at this.

Copy link
Contributor

@teisenbe teisenbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start to me. FWIW I think the security TODOs have appropriate current states and could be changed later as we learn of further requirements

/// The credentials were syntactically valid, but semantically invalid
/// (e.g., a cryptographic signature did not match)
#[error("bad credentials: {source:#}")]
BadCredentials {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include what actor this failure was for? This seems useful for internal logs, though I guess we'll want to think through what a system for tracking authentication failures would look like for us

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. @andrewjstone alluded to this, too.

I'm not sure if we'll always have this, but we do today for "spoof" and we can always make it an Option later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 98e72c0.

@@ -0,0 +1,156 @@
//! Implements a custom, test-only authn scheme that blindly trusts the client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "scheme that trusts the client without verification" to avoid the idiom

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 98e72c0.


let server_state = {
let authn = omicron_nexus::authn::external::Authenticator::new(
authn_schemes_configured,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the authenticator with its set of authn schemes is configured at the Dropshot server level. This means that all the endpoints in a given server support the same authn schemes, and conversely if we wanted two sets of endpoints to support different schemes we would have to put them in different servers.

I wonder whether it would be worth making it easier to specify sets of allowed schemes per endpoint. For example, if there are a couple of endpoints that are really only intended for the console, we might only want to support cookie auth (or whatever) on those.

Or maybe that's better handled by putting those endpoints in their own server, as the current implementation requires. One advantage is that it makes it pretty hard to proliferate lists of auth schemes. Ideally there are as few lists in operation as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the new authenticator appears in the server-wide context, that's true. But there's no reason that all endpoints have to use that authenticator. (Today, none of them does.) We can always add a second authenticator in the server-wide context and use that from the console-only endpoints. I'm not sure if that's what we want, but I think that's a "later" question because we don't have any console-only routes yet, right?

Copy link
Contributor

@david-crespo david-crespo Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I see. The use of the authenticator is an explicit call in each handler. And yes, there are no console specific routes yet.

) -> SchemeResult {
let headers = request.headers();
authn_spoof(headers.get(HTTP_HEADER_OXIDE_AUTHN_SPOOF))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a cookie auth scheme would look almost the same

let headers = request.headers();

// split on ';' and call `Cookie::parse` from the cookie crate on each chunk
let cookies = parse_cookies(headers.get("Cookie"));

authn_cookie(cookies.get(COOKIE_NAME_OXIDE_AUTHN_COOKIE))

the biggest difference being that in authn_cookie you have to look up the cookie value in a sessions table that associates session with actors.

@davepacheco
Copy link
Collaborator Author

Thanks @david-crespo @teisenbe @andrewjstone for taking a look! I think I've addressed everything.

Copy link
Contributor

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this merged. I want to start using it to experiment with console auth.

@davepacheco davepacheco merged commit 7684d83 into main Oct 20, 2021
@davepacheco davepacheco deleted the authn-start branch October 20, 2021 17:00
@davepacheco davepacheco mentioned this pull request Nov 23, 2021
71 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants