Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth: Handle dangling permissions #12992

Merged
merged 19 commits into from
Mar 5, 2024

Conversation

markylaing
Copy link
Contributor

The permissions table has entity_id, entity_type, and entitlement columns. When a permission is created, we validate that the entity exists and get its ID. However, if the entity is deleted the permission is left dangling.

Dangling permissions could never have been displayed or used because we can't construct a URL for them for use in the API responses. However, not handling them led to some strange behaviour in the OpenFGA datastore (see #12976 (comment)) and of course we don't want the permissions table to grow indefinitely.

This PR adds logic to lazily delete dangling permissions as they are encountered.

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question with this PR is can we remove the permissions when the entity is deleted? Its great we are detecting any dangling ones and handling them to avoid access control issues - but leaving orphaned entries in the DB by design seems rather unfortunate - and something we would have considered to be a bug previously.

@tomponline
Copy link
Member

Also, are there any risks of a dangling permission being re-used for a new entity of the same type and name & project?

@markylaing
Copy link
Contributor Author

Also, are there any risks of a dangling permission being re-used for a new entity of the same type and name & project?

No, when the new entity is added its ID will be different. The embedded driver wont be able to get a URL for the permission with the ID of the deleted entity because it will not be able to query for it's name/project etc. by ID. Also, when adding a permission to a group the URL is parsed to get an entity ID, as the old entity has been deleted, the new permission can only reference the new entity.

My main question with this PR is can we remove the permissions when the entity is deleted? Its great we are detecting any dangling ones and handling them to avoid access control issues - but leaving orphaned entries in the DB by design seems rather unfortunate - and something we would have considered to be a bug previously.

There are a few approaches here and non are perfect:

  1. Use database triggers to delete permissions when any entity is deleted, or when a permission is no longer associated with any groups. This would require an awful lot of triggers but would be handled by the database automatically leading to less mental overhead (e.g. not having to remember to delete permissions all over the code base). However, we should still validate all of the permissions before using them for authorization or as an API response because it's possible that we missed a trigger.
  2. Lazily clean up permissions as they are requested and ignore dangling permissions during authorization. This requires validating each permission during authorization but I think we should be doing this anyway in case we missed one (also, we currently need to get the URL of the entity to generate the OpenFGA tuple so validation is more or less a consequence of that).
  3. Use a background task to delete dangling permissions. This approach in isolation is not enough since a permission can become orphaned in-between refresh intervals.
  4. Actively delete permissions manually when we delete an entity. This approach was necessary when we implemented the remote OpenFGA authorizer as we need to create/rename/delete tuples in the remote store. The calls are still present in the code base and I am planning to remove them (Remove calls to Authorizer.Add/Remove/Rename<entity> #12975). My feeling is this approach negates the benefit of building the embedded driver, as there is now another mental overhead when deleting an entity (entity specific clean up - fine, delete from DB -fine, remember to emit a lifecycle event - error prone, remember to delete any associated permissions - error prone). This has already led to some issues (see lxc/incus@a7431c6, in this case not all storage volume types were being added/removed).

I realised this morning that there is another case where permissions can be dangling. If a permission is added to a group, the permission is created and an association is made between the group and the permission. If the group is subsequently deleted, the association between the group and permission is also deleted but the permission itself is not deleted.

Overall, my feeling is that the best approach will be a combination of 2 and 3. 2 ensures that dangling permissions are never used for authorization, nor are they ever shown via the API. 3 can additionally delete permissions that don't belong to any groups and will ensure clean up happens regularly enough that the permissions table doesn't grow too much. I kind of think of this like a garbage collector that optimistically cleans up when something goes out of scope, but also does a stop-the-world periodically.

@tomponline tomponline marked this pull request as draft February 29, 2024 16:09
@tomponline
Copy link
Member

Moved to draft as approach has changed after discussion

@markylaing
Copy link
Contributor Author

@tomponline I've implemented the changes we discussed:

  • The permissions table is removed and auth_groups_permissions contains permission data directly. Permissions associated with a group will now automatically be deleted via foreign key relationship.
  • On daemon start up, if we are the leader we apply triggers to the database that automatically clean up dangling permissions and warnings.
  • When getting permissions URLs we still check in case any are dangling and ignore them.
  • A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

1 similar comment
@markylaing
Copy link
Contributor Author

@tomponline I've implemented the changes we discussed:

  • The permissions table is removed and auth_groups_permissions contains permission data directly. Permissions associated with a group will now automatically be deleted via foreign key relationship.
  • On daemon start up, if we are the leader we apply triggers to the database that automatically clean up dangling permissions and warnings.
  • When getting permissions URLs we still check in case any are dangling and ignore them.
  • A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

@markylaing
Copy link
Contributor Author

I've just now remembered that we can get rid of a lot of warning deletion logic so I'll keep as draft until I've done that.

@tomponline
Copy link
Member

  • A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

Do we need this really?

@markylaing
Copy link
Contributor Author

  • A cluster task has been added to check for dangling permissions and delete them. It logs a warning if any are found (because these should have been deleted by triggers). If if fails to delete the the dangling permissions it creates a warning for each permission so that they can be manually deleted.

Do we need this really?

We can go back to cleaning up opportunistically if you like. My thinking was to remove any logic from the API endpoints that aren't specifically required. I suppose I can write a clean up function and defer call to it in the API handlers?

@tomponline
Copy link
Member

We can go back to cleaning up opportunistically if you like.

That's what the triggers were to prevent right?

@markylaing
Copy link
Contributor Author

We can go back to cleaning up opportunistically if you like.

That's what the triggers were to prevent right?

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

@tomponline
Copy link
Member

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

I don't think we need a task if the triggers are there. We could log a warning if the access checker detects them when its ignoring them perhaps?

@markylaing
Copy link
Contributor Author

Yes. Basically I added the task so that LXD complains fairly loudly if the triggers aren't working as expected.

I don't think we need a task if the triggers are there. We could log a warning if the access checker detects them when its ignoring them perhaps?

Sure happy to do that :)

@markylaing
Copy link
Contributor Author

@tomponline this PR is causing some of the clustering tests to fail. Sometimes when setting up another cluster member it fails to start up. I think it's to do with the logic for when to apply the triggers in this commit: 6b2a611

On startup I'm calling gateway.LeaderAddress and checking if the value is the same as the configured cluster address. If so I'm applying the triggers. Perhaps we can take a look together in our 1-2-1 on Monday. Have a good weekend :)

@markylaing markylaing force-pushed the dangling-permissions branch 2 times, most recently from a15cb04 to 4e5c01b Compare March 4, 2024 13:13
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Updates the method to return a slice of valid permissions, a slice of
dangling permissions, and a map of entity type to map of entity ID to
URL.

Additionally, improves performance by ensuring queries are not executed
more than once for permissions that reference the same entity.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Additionally, log a warning if dangling permissions are encountered.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
This function was previously quite complicated as we had to check
if a permission already existed before creating it and return a
slice of permission IDs to the caller. Now permissions are directly
related to groups via foreign key we can convert the URLs, delete
any existing permissions for the group, and create the new ones.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't appear to be any error wrapping in that last commit despite what the commit message says

@markylaing
Copy link
Contributor Author

There doesn't appear to be any error wrapping in that last commit despite what the commit message says

I've reworded the commits to separate it out properly. This was an oversight while rebasing.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Additionally, wrap errors returned from `EnsureSchema` and
`applyTriggers`.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
lxd/util/version.go Outdated Show resolved Hide resolved
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
tomponline
tomponline previously approved these changes Mar 5, 2024
Additionally, capitalise error messages.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
@tomponline tomponline merged commit 6c6d117 into canonical:main Mar 5, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants