Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth: Embedded OpenFGA authorization driver #12976

Merged
merged 11 commits into from
Mar 8, 2024

Conversation

markylaing
Copy link
Contributor

@markylaing markylaing commented Feb 27, 2024

Adds the embedded OpenFGA authorization driver for use with OIDC authentication. We will make this the default for TLS authenticated users when we are confident in its performance and reliability.

Implements https://discourse.ubuntu.com/t/identity-and-access-management-for-lxd/41516 with some changes. These will be updated in the spec.
Completes https://warthogs.atlassian.net/browse/LXD-679

Warning: All users that currently authenticate to LXD with OIDC will lose access to their cluster. To regain access, a user must:

  1. Make a call to LXD (e.g. lxc info) to ensure their OIDC identity has been added to the database.
  2. Create a group lxc auth group create my-group
  3. Grant the group a suitable permission. As all current OIDC users are administrators, a suitable permission is "admin" on "server". This can be performed with lxc auth group permission add my-group server admin.
  4. Add themselves to the group. This can be performed with lxc auth identity group add oidc/<email-address> my-group.

To perform steps 2-4, another authentication method must be used. This can be via an unrestricted TLS certificate or directly via the unix socket.

@markylaing markylaing added the Feature New feature, not a bug label Feb 27, 2024
@markylaing markylaing self-assigned this Feb 27, 2024
@github-actions github-actions bot added Documentation Documentation needs updating API Changes to the REST API labels Feb 27, 2024
@markylaing markylaing force-pushed the embedded-openfga branch 2 times, most recently from c3b46ef to f5b107d Compare February 27, 2024 21:37
@markylaing
Copy link
Contributor Author

I'm not sure why this is failing the compatibility check:

It's I'm using 1.21.7 locally.

@tomponline
Copy link
Member

I'm not sure why this is failing the compatibility check:

* https://github.com/canonical/lxd/actions/runs/8071688009/job/22051847315?pr=12976#step:8:212

* `go: github.com/openfga/openfga@v1.4.3 requires go@1.21.5, but 1.21 is requested`

It's I'm using 1.21.7 locally.

Think we need to switch that to use go get go@ call like the earlier PR https://github.com/canonical/lxd/blob/main/.github/workflows/tests.yml#L228

@tomponline
Copy link
Member

Can be rebased now.

@tomponline
Copy link
Member

Ready for rebase

doc/api-extensions.md Outdated Show resolved Hide resolved
@github-actions github-actions bot removed Documentation Documentation needs updating API Changes to the REST API labels Feb 28, 2024
@markylaing markylaing marked this pull request as ready for review February 28, 2024 15:42
@markylaing
Copy link
Contributor Author

I've rebased onto main. Hopefully this looks a lot less daunting to review now 😆

@markylaing
Copy link
Contributor Author

While testing I've come across this error: https://github.com/canonical/lxd/actions/runs/8083078860/job/22085325564?pr=12976#step:13:7250

The test isn't catching it but this was caused by the OpenFGA datastore trying to resolve the URL of a permission whose entity no longer existed.

I have a fix for this already but it's actually only a one-line change in the driver, the majority of the changes are in the permission API handlers and some already existing DB methods. I think for the sake of keeping this PR simple I will raise it as a separate PR and this PR can be rebased onto that one when merged. So for now I'm putting this back to draft status.

@markylaing markylaing marked this pull request as draft February 28, 2024 21:07
@markylaing
Copy link
Contributor Author

While testing I've come across this error: https://github.com/canonical/lxd/actions/runs/8083078860/job/22085325564?pr=12976#step:13:7250

The test isn't catching it but this was caused by the OpenFGA datastore trying to resolve the URL of a permission whose entity no longer existed.

I have a fix for this already but it's actually only a one-line change in the driver, the majority of the changes are in the permission API handlers and some already existing DB methods. I think for the sake of keeping this PR simple I will raise it as a separate PR and this PR can be rebased onto that one when merged. So for now I'm putting this back to draft status.

Once #12992 is merged I'll rebase and unmark as draft.

1 similar comment
@markylaing
Copy link
Contributor Author

While testing I've come across this error: https://github.com/canonical/lxd/actions/runs/8083078860/job/22085325564?pr=12976#step:13:7250

The test isn't catching it but this was caused by the OpenFGA datastore trying to resolve the URL of a permission whose entity no longer existed.

I have a fix for this already but it's actually only a one-line change in the driver, the majority of the changes are in the permission API handlers and some already existing DB methods. I think for the sake of keeping this PR simple I will raise it as a separate PR and this PR can be rebased onto that one when merged. So for now I'm putting this back to draft status.

Once #12992 is merged I'll rebase and unmark as draft.

@markylaing markylaing force-pushed the embedded-openfga branch 2 times, most recently from fd6d29b to aef530f Compare February 29, 2024 11:59
@mas-who
Copy link

mas-who commented Mar 1, 2024

@markylaing as discussed, I will just leave this comment here regarding idp groups for future reference. So it was found that after an OIDC identity logs in, if its idp groups were successfully mapped via the custom claim, the identity gets assigned to the correct lxd groups and therefore gets the right permissions. However, when we query the API for group memberships, the association between the lxd groups and the identity is missing. It was mentioned for idp mapped groups, the relationship between those groups and the identity is not currently stored in the database, so we can't get that data.

@mas-who
Copy link

mas-who commented Mar 1, 2024

@markylaing found another issue with the identity-provider-group endpoint. See below terminal output:

root@lxd-permissions-dev:~/lxd# lxc query /1.0/auth/groups
[
        "/1.0/auth/groups/admin-group",
        "/1.0/auth/groups/big-permission",
        "/1.0/auth/groups/small-permission"
]

root@lxd-permissions-dev:~/lxd# lxc query /1.0/auth/identity-provider-groups/test?recursion=1
{
        "groups": [],
        "name": "test"
}

root@lxd-permissions-dev:~/lxd# lxc auth identity-provider-group group add test admin-group
Error: Failed to write identity provider group mappings: FOREIGN KEY constraint failed

For some reason trying to add lxd group to an idp group fails foreign key constraint. But using the same CLI command I was able to add a lxd group to the owner idp group. For some more context, see below table data for the relevant endpoints:

root@lxd-permissions-dev:~/lxd# lxd sql global "select * from auth_groups"
DEBUG  [2024-03-01T15:14:29Z] Connecting to a local LXD over a Unix socket 
DEBUG  [2024-03-01T15:14:29Z] Sending request to LXD                        etag= method=POST url="http://unix.socket/internal/sql"
DEBUG  [2024-03-01T15:14:29Z] 
        {
                "database": "global",
                "query": "select * from auth_groups"
        } 
+----+------------------+-------------+
| id |       name       | description |
+----+------------------+-------------+
| 1  | admin-group      |             |
| 8  | big-permission   |             |
| 9  | small-permission |             |
+----+------------------+-------------+
root@lxd-permissions-dev:~/lxd# lxd sql global "select * from identity_provider_groups"
DEBUG  [2024-03-01T15:14:36Z] Connecting to a local LXD over a Unix socket 
DEBUG  [2024-03-01T15:14:36Z] Sending request to LXD                        etag= method=POST url="http://unix.socket/internal/sql"
DEBUG  [2024-03-01T15:14:36Z] 
        {
                "database": "global",
                "query": "select * from identity_provider_groups"
        } 
+----+-------+
| id | name  |
+----+-------+
| 1  | owner |
| 5  | test  |
+----+-------+
root@lxd-permissions-dev:~/lxd# lxd sql global "select * from auth_groups_identity_provider_groups"
DEBUG  [2024-03-01T15:14:48Z] Connecting to a local LXD over a Unix socket 
DEBUG  [2024-03-01T15:14:48Z] Sending request to LXD                        etag= method=POST url="http://unix.socket/internal/sql"
DEBUG  [2024-03-01T15:14:48Z] 
        {
                "database": "global",
                "query": "select * from auth_groups_identity_provider_groups"
        } 
+----+---------------+----------------------------+
| id | auth_group_id | identity_provider_group_id |
+----+---------------+----------------------------+
| 1  | 1             | 1                          |
+----+---------------+----------------------------+

Seems like an insert into the auth_groups_identity_provider_groups is causing issues?

@edlerd
Copy link
Contributor

edlerd commented Mar 1, 2024

I suggest to use 403 over 404 when users have no permission.

Consider the scenario:

Alice has no access to secret-project on the server
In the current state, lxd responds with

  • 403 on 1.0/instances?project=nonexisting-project, but also with a
  • 403 on 1.0/instances?project=secret-project

It should definitely respond with 404 to 1.0/instances?project=nonexisting-project, because that project doesn't exist.

We shouldn't expose the existence of secret-project to Alice, hence we should always use 404 when users have no permission.

@mas-who
Copy link

mas-who commented Mar 4, 2024

@markylaing I was playing around with different permissions and found a potential issue. I created a group with a single permission can_view_instances on project. Then I allocated this group to an oidc identity. Upon login, I can't view any instances and looking at the server debug logs, I see the below info:

DEBUG  [2024-03-04T10:40:41Z] Error Response
        {
                "type": "error",
                "status": "",
                "status_code": 0,
                "operation": "",
                "error_code": 403,
                "error": "User does not have entitlement \"can_view\" on object \"project:/1.0/projects/default\"",
                "metadata": null
        }  http_code=403

Is this intended behaviour? Furthermore, I then changed the group permission to have can_view entitlement on project, the 403 response went away but no instances were returned from the server, seems like it was filtered out somehow.

@markylaing
Copy link
Contributor Author

@markylaing I was playing around with different permissions and found a potential issue. I created a group with a single permission can_view_instances on project. Then I allocated this group to an oidc identity. Upon login, I can't view any instances and looking at the server debug logs, I see the below info:

DEBUG  [2024-03-04T10:40:41Z] Error Response
        {
                "type": "error",
                "status": "",
                "status_code": 0,
                "operation": "",
                "error_code": 403,
                "error": "User does not have entitlement \"can_view\" on object \"project:/1.0/projects/default\"",
                "metadata": null
        }  http_code=403

Is this intended behaviour? Furthermore, I then changed the group permission to have can_view entitlement on project, the 403 response went away but no instances were returned from the server, seems like it was filtered out somehow.

That sounds like a bug to me. Will investigate when I get back to this after #12992. Thanks!

@tomponline
Copy link
Member

ready for rebase

…URL.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
This commit updates the method to extract the identity
provider groups and additionally use the more concise
`request.GetCtxValue` method.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
@markylaing markylaing marked this pull request as ready for review March 8, 2024 10:24
github.com/olekukonko/tablewriter v0.0.5
github.com/openfga/api/proto v0.0.0-20240307175852-df09bc1e8e82
github.com/openfga/language/pkg/go v0.0.0-20240307152633-772cdd8b6a9c
github.com/openfga/openfga v1.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pulling in lots of stuff, two surprising items are mysql driver and docker client?

Is it possible for us to consume a more specific package rather than the general openfga package.

Not a blocker ofc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible as the go.mod is in the root of the repo. I could look into which specific dependencies are required by the packages we are using and potentially we can write exclude directives to get rid of mysql and docker etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think go mod tidy would have done this for us already if it wasnt needed - i was wondering if we did need to import the main package or whether we can be more specific somehow?

return nil, nil
}

// ReadUsersetTuples is called on check requests. It is used to read all the "users" that have a given relation to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a request context optimisation here to only check for the current user rather than checking for all users who have access? (in the future).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little nuanced. "user" in this context is not the identity that made the request. OpenFGA tuples are composed of a user, a relation and an object. In the datastore implementation whenever we see a user that is an identity, we typically ignore it because all of their relations have been passed in contextually. Here are some examples:

user: server:/1.0
relation: server
object: project:/1.0/projects/default
user: project:/1.0/projects/default
relation: project
object: instance:/1.0/instances/foo?project=default
user: identity:/1.0/auth/identities/oidc/jane.doe@example.com
relation: member
object: group:/1.0/auth/groups/foo-group

We would ignore this one, identities can't be given entitlements directly.

user: identity:/1.0/auth/identities/oidc/jane.doe@example.com
relation: can_view
object: instance:/1.0/instances/foo?project=bar

}

// Get all groups with the permission.
q := `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the refactoring tasks please can this query be moved into the clusterDB package. Thanks

if err != nil && !api.StatusErrorCheck(err, http.StatusNotFound) {
return nil, err
} else if err != nil {
// If we have a not found error then there are no tuples to return, but the datastore shouldn't return an error.
Copy link
Member

@tomponline tomponline Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite awkwardly structured, where the specific scenario that occurs when there is a not found error is checking err != nil. In my view clearer would have been:

if err != nil {
        if api.StatusErrorCheck(err, http.StatusNotFound) {
	        // If we have a not found error then there are no tuples to return, but the datastore shouldn't return an error.
		    return storage.NewStaticTupleIterator(nil), nil
        }

		return nil, err
} 

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we need to move DB queries out of the openfga driver and into the clusterDB package. As part of future refactor.

// Inspect request.
details, err := e.requestDetails(r)
if err != nil {
return api.StatusErrorf(http.StatusForbidden, "Failed to extract request details: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be %w

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in this case I'm not sure it can be? Do we need api.StatusError to implement Unwrap for this to work properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah ofcourse you're correct. Although now im looking closer how do we know that the err means forbidden anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api.StatusErrorCheck is able to unwrap until it gets to the api.StatusError and then it can check the status code. I've just implemented the interface for it now so we can pass in %w and wrap underlying errors. Should mean we will be able to distinguish a general 404 into sql.ErrNoRows or io.ErrNotExist in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but what I mean is that this actually feels like the wrong place to be returning api.StatusErrorf(http.StatusForbidden), shouldn't that be done in e.requestDetails so it has the option of returning some other sort of status code (internal server error for example or bad request) if the issue a bad request or an internal issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see what you mean. Yes I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can include in #13077

@@ -136,3 +143,245 @@ EOF
lxc config unset oidc.issuer
lxc config unset oidc.client.id
}


fine_grained_authorization() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expand this to test for user_by filtering too in the refactor PRs.

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@tomponline
Copy link
Member

There's a few follow up points for the forthcoming refactor but no blockers i could see.

@tomponline tomponline merged commit ca62409 into canonical:main Mar 8, 2024
27 checks passed
@markylaing
Copy link
Contributor Author

In general we need to move DB queries out of the openfga driver and into the clusterDB package. As part of future refactor.

There are no queries in the driver itself, the queries are in the datastore implementation in the lxd/db package. Should I move them into the cluster package?

@tomponline
Copy link
Member

In general we need to move DB queries out of the openfga driver and into the clusterDB package. As part of future refactor.

There are no queries in the driver itself, the queries are in the datastore implementation in the lxd/db package. Should I move them into the cluster package?

Ah ok lets leave as is then. Thanks

return nil, fmt.Errorf("Could not get entity ID from URL: No statement found for entity type %q", entityType)
}

args := []any{0, projectName, location}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the comments in this function as the 0 here looks suspicious. It's there because the entity ID queries take an index so that we can correlate the output (see PopulateEntityReferencesFromURLs). In this case we're only querying for one ID so I'm just passing in a placeholder value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

return nil, nil
}

// ReadUsersetTuples is called on check requests. It is used to read all the "users" that have a given relation to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little nuanced. "user" in this context is not the identity that made the request. OpenFGA tuples are composed of a user, a relation and an object. In the datastore implementation whenever we see a user that is an identity, we typically ignore it because all of their relations have been passed in contextually. Here are some examples:

user: server:/1.0
relation: server
object: project:/1.0/projects/default
user: project:/1.0/projects/default
relation: project
object: instance:/1.0/instances/foo?project=default
user: identity:/1.0/auth/identities/oidc/jane.doe@example.com
relation: member
object: group:/1.0/auth/groups/foo-group

We would ignore this one, identities can't be given entitlements directly.

user: identity:/1.0/auth/identities/oidc/jane.doe@example.com
relation: can_view
object: instance:/1.0/instances/foo?project=bar

@tomponline
Copy link
Member

This is a little nuanced. "user" in this context is not the identity that made the request. OpenFGA tuples are composed of a user, a relation and an object. In the datastore implementation whenever we see a user that is an identity, we typically ignore it because all of their relations have been passed in contextually. Here are some examples:

Thanks I think that would be worth clarifying in the comment then as it confused me a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants