authz: update disk lookup and protect disk get/attach/detach #662

davepacheco · 2022-02-01T22:15:37Z

In this change, I set out to add authz protection for GET disk using the same pattern described in #592 and extended in several other PRs. To recap, you'd expect:

disk_fetch(authz::Project, Name): the main way to fetch a whole disk record
disk_lookup_path(organization_name, project_name, disk_name): the main way to find the disk id for a path in the API

This change grew a lot larger because there are tendrils in various places. So now there's also:

disk_lookup_by_id(Uuid): this does a "lookup" (the way disk_lookup_by_path does, returning an authz::Disk and not the full database record) using the disk's id
disk_refetch(authz::Disk): this is desirable for complicated reasons. Basically: if you have an authz::Disk and you want the database record, this function gives it to you (after doing an authz check).

More on disk_refetch:

`disk_refetch` looks at first glance like a wart. Maybe it is. Why do we need it?

With the conventions we have today, the only way to get a full database record would be using disk_fetch(authz::Project, disk_name). That doesn't help you if you have the disk's id -- you don't have its parent Project (and no easy way to get it) and you don't have the disk's name.

Why not make this disk_fetch_by_id? That's essentially what this is: disk_refetch takes an authz::Disk, which, from a user's perspective, is just a disk_id. The difference is that authz::Disk also includes information about how you looked up the Disk in the first place. If we accepted only a disk id (Uuid) here, and the disk didn't exist, we'd generate a 404 that said "disk with id 123 doesn't exist". But the user may not have come into this function looking for disk id 123. For example, the disk attach code path does an initial mapping from path to id using disk_lookup_path. Then it does a call to the Sled Agent and then a database operation. Then it uses disk_refetch to get the latest version of the record. If the disk was gone at that point, the user would get a 404 "disk with id 123 doesn't exist", which makes no sense because they didn't ask for the disk with id 123 -- they asked for disk "d1" in project "p1" in organization "o1".

That might sound contrived or pedantic (and maybe it is?). But the point here is to preserve information about how a resource was looked up. I think that'll be valuable both for end users and our own debugging. It also forces us to be careful with our own lookups: each operation should do one lookup, probably early on, and hang onto the authz::Disk (or whatever) for the duration of the operation. That also ensures that we don't accidentally switch objects because we do another lookup later.

Other changes here:

To support disk_lookup_by_id, we also need project_lookup_by_id and organization_lookup_by_id.
Renamed *_lookup_path to *_lookup_by_path (that's for you, @david-crespo -- I think it's a good call)
Added a new "internal-api" internal user, which is used for handling some requests to the internal API. (Before this, there was no OpContext for requests to the internal API. Now, those APIs can have OpContexts, and they'll have this user. This is still basically going to be a super-user, but it's a step in the right direction.)
Added type aliases for authz::ProjectChild for authz::Disk and authz::Instance, mostly for clarity for readers. authz::ProjectChild is no longer exposed from authz.
Added protection to the disk attach/detach endpoints because it was easier than not doing that, with all the changes I have here
Fixes a bug where a POST to detach a disk from an Instance returned the previous state ("attached", usually) instead of the new one ("detaching"). I updated the disks integration test to check for this.

davepacheco · 2022-02-02T19:37:02Z

This is ready for review but depends on #661.

zephraph · 2022-02-03T06:32:15Z

nexus/src/db/datastore.rs

+    // TODO-security For Containers in the hierarchy (like Organizations and
+    // Projects), we don't do an authz check in the "lookup_by_path" functions
+    // because we don't know if the caller has access to do the lookup.  For
+    // leaf resources (like Instances and Disks), though, we do.  We could do
+    // the authz check here, and in disk_lookup_by_id() too.  Should we?


I worry that having a separation between what's auth'd and what's not auth'd being differentiated by logical hierarchy might potentially lead to bugs. Is there some way we could be more explicit in describing that a particular reference has been auth'd? I mean, we could continue with naming schemes like _noauthz or _no_auth which is helpful at the callsite... I wonder more if there's something we can do with the return type of these lookups though. One could mistakenly assume that authz::Disk meant authorized disk at first glance. Would it be possible to have a type, be it a newtype pattern or otherwise, that wrapped the authz::* structs to indicate that they had indeed been authorized. The authorize function could be the only thing to return that type and then we could lean on some level of type safety to save us from ourselves.

Yeah, it's a compelling idea. One line of thinking I've had: authorize could return some newtype around () that essentially just says "someone has successfully called authorize() before". Obviously that's only a little useful -- they might have authorized a different object for a different thing. So you could store something about the object that was authorized in that return value. But that's only useful if somebody checks it, right? How would they check it? They'd ask if somebody had authorized some action on some resource -- it would look just like the authorize() call itself. They may as well just call authorize().

I think you're suggesting that we'd encode what was authorized in the type. That'd be neat. I think that would need to include both the resource and the action. Something like AuthorizationResult<Resource=authz::Disk,Action=Read>? It's tricky, though -- a function that accepts that for Action=Read should also accept it for Action=Write. Do you have more thoughts on how this might work?

I think the root problem we're trying to solve is that it's easy for developers to make mistakes with authz, particularly forgetting to do an authz check. I think there are other opportunities to improve this too. I'd love to get to a place where there's a sort of datastore kernel (for lack of a better word) where internally various functions may do authz checks or not, but the public interface would always do the appropriate authz check. We'd need something similar protecting access to Sled Agent, Crucible, etc.

I think it's very likely we'd want more more than an empty new type. My intuition is that we'd want it to be the authz resource reference as well as the actions as you mention. As for what it would look like, I think it'd take some more detailed exploration. If there's a small set of actions with increasing permissions then something using a bit flag and some logic like anything above a certain value can take an action. I don't exactly know, but having a single field that can represent granted permissions (be it bit flags or otherwise) seems like it'd be something we'd want to include.

I think my biggest preference for how this would work is that we have a macro that's used at the datastore level that dictates what permissions it takes. That macro would take care of all the auth/validation stuff to ensure the action could be performed. We'd then pair that with a linter to fail anything without explicit macro usage. Something like that anyway.

I'm not 100% sure, but I definitely think that it's worth exploring. I wouldn't stop the presses to get to that now, it seems more like a pre-launch/post-demo priority.

zephraph

I think this is a solid step in the right direction. I still want to talk more about how to make our auth system safer (especially if we can leverage types to do so), but that's orthogonal to the work here.

Also, I know it's understood but we should remember to make time for performance testing our db performance to ensure the approach of many smaller queries is better than the utilization of joins.

davepacheco · 2022-02-03T19:52:37Z

Also, I know it's understood but we should remember to make time for performance testing our db performance to ensure the approach of many smaller queries is better than the utilization of joins.

Yes, I'd say even stronger: I don't expect many small queries to be better than JOINs here and I do hope we get a chance to improve that.

davepacheco · 2022-02-03T20:59:23Z

Thanks for taking a look!

davepacheco added 11 commits January 27, 2022 16:45

authz: make it easier to test authn/authz protection for new endpoints

4086d58

rip out now-redundant tests

2c28623

clean up and document new test

3eec4e3

self-review: clean up

af6b8d7

Merge branch 'main' into authz-tests

13e0c74

authz: protect disk list endpoint

80b0013

protect create too and update tests

01e7681

fix oxapi_demo

ce5057c

Merge branch 'main' into authz-work

9e02487

lookup_path -> lookup_by_path

b2c7315

first (rough) cut: authz for disk lookup in the datastore

c6d1d2b

davepacheco changed the title ~~authz: protect disk GET (plus a lot of refactoring)~~ authz: update disk lookup and protect disk get/attach/detach Feb 1, 2022

davepacheco mentioned this pull request Feb 1, 2022

tracking issue for authz work #419

Closed

71 tasks

davepacheco added 7 commits February 1, 2022 16:10

fix some tests

b48b175

fix more tests

1e1b5d5

more tests

f580ba4

fix oxapi_demo for disk_get

f0d597a

remove XXX (checked behavior)

4a26d6c

another oxapi_demo fix

3066ca1

clarify use of test user/contexts -- not just unit tests

cc08493

davepacheco requested a review from david-crespo February 2, 2022 19:36

zephraph reviewed Feb 3, 2022

View reviewed changes

zephraph approved these changes Feb 3, 2022

View reviewed changes

Base automatically changed from authz-work to main February 3, 2022 18:00

Merge branch 'main' into authz-instances-disks

a0c31e3

davepacheco marked this pull request as ready for review February 3, 2022 20:59

fix mismerge

4718726

davepacheco enabled auto-merge (squash) February 3, 2022 21:06

davepacheco merged commit ed28c2a into main Feb 3, 2022

davepacheco deleted the authz-instances-disks branch February 3, 2022 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

authz: update disk lookup and protect disk get/attach/detach #662

authz: update disk lookup and protect disk get/attach/detach #662

Uh oh!

davepacheco commented Feb 1, 2022 •

edited

Loading

Uh oh!

davepacheco commented Feb 2, 2022

Uh oh!

zephraph Feb 3, 2022

Uh oh!

davepacheco Feb 3, 2022

Uh oh!

zephraph Feb 3, 2022

Uh oh!

zephraph left a comment

Uh oh!

davepacheco commented Feb 3, 2022

Uh oh!

davepacheco commented Feb 3, 2022

Uh oh!

Uh oh!

authz: update disk lookup and protect disk get/attach/detach #662

authz: update disk lookup and protect disk get/attach/detach #662

Uh oh!

Conversation

davepacheco commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davepacheco commented Feb 2, 2022

Uh oh!

zephraph Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

davepacheco Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

zephraph Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

zephraph left a comment

Choose a reason for hiding this comment

Uh oh!

davepacheco commented Feb 3, 2022

Uh oh!

davepacheco commented Feb 3, 2022

Uh oh!

Uh oh!

davepacheco commented Feb 1, 2022 •

edited

Loading