Auth: Handle dangling permissions #12992

markylaing · 2024-02-28T23:19:38Z

The permissions table has entity_id, entity_type, and entitlement columns. When a permission is created, we validate that the entity exists and get its ID. However, if the entity is deleted the permission is left dangling.

Dangling permissions could never have been displayed or used because we can't construct a URL for them for use in the API responses. However, not handling them led to some strange behaviour in the OpenFGA datastore (see #12976 (comment)) and of course we don't want the permissions table to grow indefinitely.

This PR adds logic to lazily delete dangling permissions as they are encountered.

lxd/db/cluster/permissions.go

tomponline

My main question with this PR is can we remove the permissions when the entity is deleted? Its great we are detecting any dangling ones and handling them to avoid access control issues - but leaving orphaned entries in the DB by design seems rather unfortunate - and something we would have considered to be a bug previously.

tomponline · 2024-02-29T09:27:21Z

Also, are there any risks of a dangling permission being re-used for a new entity of the same type and name & project?

markylaing · 2024-02-29T10:44:00Z

Also, are there any risks of a dangling permission being re-used for a new entity of the same type and name & project?

No, when the new entity is added its ID will be different. The embedded driver wont be able to get a URL for the permission with the ID of the deleted entity because it will not be able to query for it's name/project etc. by ID. Also, when adding a permission to a group the URL is parsed to get an entity ID, as the old entity has been deleted, the new permission can only reference the new entity.

My main question with this PR is can we remove the permissions when the entity is deleted? Its great we are detecting any dangling ones and handling them to avoid access control issues - but leaving orphaned entries in the DB by design seems rather unfortunate - and something we would have considered to be a bug previously.

There are a few approaches here and non are perfect:

Use database triggers to delete permissions when any entity is deleted, or when a permission is no longer associated with any groups. This would require an awful lot of triggers but would be handled by the database automatically leading to less mental overhead (e.g. not having to remember to delete permissions all over the code base). However, we should still validate all of the permissions before using them for authorization or as an API response because it's possible that we missed a trigger.
Lazily clean up permissions as they are requested and ignore dangling permissions during authorization. This requires validating each permission during authorization but I think we should be doing this anyway in case we missed one (also, we currently need to get the URL of the entity to generate the OpenFGA tuple so validation is more or less a consequence of that).
Use a background task to delete dangling permissions. This approach in isolation is not enough since a permission can become orphaned in-between refresh intervals.
Actively delete permissions manually when we delete an entity. This approach was necessary when we implemented the remote OpenFGA authorizer as we need to create/rename/delete tuples in the remote store. The calls are still present in the code base and I am planning to remove them (Remove calls to Authorizer.Add/Remove/Rename<entity> #12975). My feeling is this approach negates the benefit of building the embedded driver, as there is now another mental overhead when deleting an entity (entity specific clean up - fine, delete from DB -fine, remember to emit a lifecycle event - error prone, remember to delete any associated permissions - error prone). This has already led to some issues (see lxc/incus@a7431c6, in this case not all storage volume types were being added/removed).

I realised this morning that there is another case where permissions can be dangling. If a permission is added to a group, the permission is created and an association is made between the group and the permission. If the group is subsequently deleted, the association between the group and permission is also deleted but the permission itself is not deleted.

Overall, my feeling is that the best approach will be a combination of 2 and 3. 2 ensures that dangling permissions are never used for authorization, nor are they ever shown via the API. 3 can additionally delete permissions that don't belong to any groups and will ensure clean up happens regularly enough that the permissions table doesn't grow too much. I kind of think of this like a garbage collector that optimistically cleans up when something goes out of scope, but also does a stop-the-world periodically.

tomponline · 2024-02-29T16:10:09Z

Moved to draft as approach has changed after discussion

markylaing · 2024-03-01T13:18:20Z

@tomponline I've implemented the changes we discussed:

The permissions table is removed and auth_groups_permissions contains permission data directly. Permissions associated with a group will now automatically be deleted via foreign key relationship.
On daemon start up, if we are the leader we apply triggers to the database that automatically clean up dangling permissions and warnings.
When getting permissions URLs we still check in case any are dangling and ignore them.
A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

markylaing · 2024-03-01T13:18:31Z

@tomponline I've implemented the changes we discussed:

The permissions table is removed and auth_groups_permissions contains permission data directly. Permissions associated with a group will now automatically be deleted via foreign key relationship.
On daemon start up, if we are the leader we apply triggers to the database that automatically clean up dangling permissions and warnings.
When getting permissions URLs we still check in case any are dangling and ignore them.
A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

markylaing · 2024-03-01T13:19:05Z

I've just now remembered that we can get rid of a lot of warning deletion logic so I'll keep as draft until I've done that.

tomponline · 2024-03-01T13:42:32Z

A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

Do we need this really?

markylaing · 2024-03-01T17:26:55Z

A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

Do we need this really?

We can go back to cleaning up opportunistically if you like. My thinking was to remove any logic from the API endpoints that aren't specifically required. I suppose I can write a clean up function and defer call to it in the API handlers?

tomponline · 2024-03-01T18:03:08Z

We can go back to cleaning up opportunistically if you like.

That's what the triggers were to prevent right?

markylaing · 2024-03-01T18:06:41Z

We can go back to cleaning up opportunistically if you like.

That's what the triggers were to prevent right?

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

tomponline · 2024-03-01T18:26:56Z

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

I don't think we need a task if the triggers are there. We could log a warning if the access checker detects them when its ignoring them perhaps?

markylaing · 2024-03-01T18:33:42Z

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

I don't think we need a task if the triggers are there. We could log a warning if the access checker detects them when its ignoring them perhaps?

Sure happy to do that :)

markylaing · 2024-03-01T18:34:01Z

@tomponline this PR is causing some of the clustering tests to fail. Sometimes when setting up another cluster member it fails to start up. I think it's to do with the logic for when to apply the triggers in this commit: 6b2a611

On startup I'm calling gateway.LeaderAddress and checking if the value is the same as the configured cluster address. If so I'm applying the triggers. Perhaps we can take a look together in our 1-2-1 on Monday. Have a good weekend :)

Signed-off-by: Mark Laing <mark.laing@canonical.com>

Updates the method to return a slice of valid permissions, a slice of dangling permissions, and a map of entity type to map of entity ID to URL. Additionally, improves performance by ensuring queries are not executed more than once for permissions that reference the same entity. Signed-off-by: Mark Laing <mark.laing@canonical.com>

Additionally, log a warning if dangling permissions are encountered. Signed-off-by: Mark Laing <mark.laing@canonical.com>

This function was previously quite complicated as we had to check if a permission already existed before creating it and return a slice of permission IDs to the caller. Now permissions are directly related to groups via foreign key we can convert the URLs, delete any existing permissions for the group, and create the new ones. Signed-off-by: Mark Laing <mark.laing@canonical.com>

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster/open.go

tomponline

There doesn't appear to be any error wrapping in that last commit despite what the commit message says

markylaing · 2024-03-05T09:04:41Z

There doesn't appear to be any error wrapping in that last commit despite what the commit message says

I've reworded the commits to separate it out properly. This was an oversight while rebasing.

Signed-off-by: Mark Laing <mark.laing@canonical.com>

Additionally, wrap errors returned from `EnsureSchema` and `applyTriggers`. Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/util/version.go

Signed-off-by: Mark Laing <mark.laing@canonical.com>

Additionally, capitalise error messages. Signed-off-by: Mark Laing <mark.laing@canonical.com>

Signed-off-by: Mark Laing <mark.laing@canonical.com>

markylaing self-assigned this Feb 28, 2024

markylaing requested a review from tomponline as a code owner February 28, 2024 23:19

markylaing mentioned this pull request Feb 28, 2024

Auth: Embedded OpenFGA authorization driver #12976

Merged

markylaing force-pushed the dangling-permissions branch 2 times, most recently from 4702572 to 6812751 Compare February 28, 2024 23:45

tomponline reviewed Feb 29, 2024

View reviewed changes

lxd/db/cluster/permissions.go Outdated Show resolved Hide resolved

tomponline reviewed Feb 29, 2024

View reviewed changes

lxd/db/cluster/permissions.go Outdated Show resolved Hide resolved

markylaing force-pushed the dangling-permissions branch from 6812751 to ea72ac3 Compare February 29, 2024 09:08

markylaing requested a review from tomponline February 29, 2024 09:11

tomponline reviewed Feb 29, 2024

View reviewed changes

tomponline marked this pull request as draft February 29, 2024 16:09

markylaing force-pushed the dangling-permissions branch 2 times, most recently from f139229 to e97686d Compare February 29, 2024 23:59

markylaing mentioned this pull request Mar 1, 2024

DB: Fix query for storage volume snaphot #13006

Merged

markylaing force-pushed the dangling-permissions branch from e97686d to 6287941 Compare March 1, 2024 13:10

markylaing force-pushed the dangling-permissions branch 2 times, most recently from a15cb04 to 4e5c01b Compare March 4, 2024 13:13

markylaing added 6 commits March 4, 2024 16:29

lxd/db/cluster: Remove AuthGroupsByPermissionIDs method.

a235136

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster: Refactor auth group DB methods for schema change.

335991d

Additionally, log a warning if dangling permissions are encountered. Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd: Refactor auth group handlers to use new db methods.

0a3b4b9

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd: Refactor permissions handler to use new db methods.

8bb1798

Signed-off-by: Mark Laing <mark.laing@canonical.com>

markylaing force-pushed the dangling-permissions branch from b8acb6b to e3fdb6b Compare March 4, 2024 17:01

markylaing requested review from tomponline and roosterfish March 4, 2024 17:02

tomponline reviewed Mar 4, 2024

View reviewed changes

lxd/db/cluster/open.go Show resolved Hide resolved

tomponline reviewed Mar 4, 2024

View reviewed changes

lxd/db/cluster/open.go Show resolved Hide resolved

tomponline reviewed Mar 4, 2024

View reviewed changes

markylaing force-pushed the dangling-permissions branch from e3fdb6b to 9792ab9 Compare March 5, 2024 09:03

markylaing added 3 commits March 5, 2024 09:10

lxd/db/cluster: Add SQL triggers for deletion of each entity type.

c223a4d

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster: Add an ApplyTriggers function.

5217c70

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster: Apply triggers when cluster DB is opened.

7e51e3f

Additionally, wrap errors returned from `EnsureSchema` and `applyTriggers`. Signed-off-by: Mark Laing <mark.laing@canonical.com>

markylaing commented Mar 5, 2024

View reviewed changes

lxd/util/version.go Outdated Show resolved Hide resolved

markylaing force-pushed the dangling-permissions branch from 9792ab9 to 7eed28f Compare March 5, 2024 09:40

markylaing added 2 commits March 5, 2024 09:41

lxd/util: Rename 'node' to 'cluster member'.

d60e007

Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster: Remove redunant parentheses.

801ae37

Signed-off-by: Mark Laing <mark.laing@canonical.com>

markylaing force-pushed the dangling-permissions branch from 7eed28f to 1741b7c Compare March 5, 2024 09:52

tomponline previously approved these changes Mar 5, 2024

View reviewed changes

markylaing added 3 commits March 5, 2024 09:55

lxd/db/cluster: Rename "node" to "cluster member".

bda30c0

Additionally, capitalise error messages. Signed-off-by: Mark Laing <mark.laing@canonical.com>

lxd/db/cluster: Update error messages in unit tests.

e8fd0b5

Signed-off-by: Mark Laing <mark.laing@canonical.com>

test/suites: Check that permissions are deleted when entity is deleted.

fcde91b

Signed-off-by: Mark Laing <mark.laing@canonical.com>

markylaing dismissed tomponline’s stale review via fcde91b March 5, 2024 09:58

markylaing force-pushed the dangling-permissions branch from 1741b7c to fcde91b Compare March 5, 2024 09:58

tomponline approved these changes Mar 5, 2024

View reviewed changes

tomponline merged commit 6c6d117 into canonical:main Mar 5, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auth: Handle dangling permissions #12992

Auth: Handle dangling permissions #12992

markylaing commented Feb 28, 2024

tomponline left a comment

tomponline commented Feb 29, 2024

markylaing commented Feb 29, 2024

tomponline commented Feb 29, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline left a comment

markylaing commented Mar 5, 2024

Auth: Handle dangling permissions #12992

Auth: Handle dangling permissions #12992

Conversation

markylaing commented Feb 28, 2024

tomponline left a comment

Choose a reason for hiding this comment

tomponline commented Feb 29, 2024

markylaing commented Feb 29, 2024

tomponline commented Feb 29, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline commented Mar 1, 2024

markylaing commented Mar 1, 2024

markylaing commented Mar 1, 2024

tomponline left a comment

Choose a reason for hiding this comment

markylaing commented Mar 5, 2024