Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication with ServiceAccountToken #5138

Closed
yanniszark opened this issue Feb 15, 2021 · 63 comments
Closed

Add authentication with ServiceAccountToken #5138

yanniszark opened this issue Feb 15, 2021 · 63 comments
Assignees
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@yanniszark
Copy link
Contributor

Problem Statement

Clients in various namespaces (e.g., Notebooks) need to access the Pipelines API. However, there is currently no way for these clients to authenticate to the Pipelines API:
#4440
#4733
In-cluster clients need a way to authenticate to the KFP API Server.

Proposed Solution

The correct way to do this is by using audience-scoped ServiceAccountTokens. In Arrikto's Kubeflow distribution, we have been successfully using this method for a long time, in numerous customer environments. We want to upstream this solution so the whole community can benefit as well, since we see this is an issue many users bump into.
Changes need to happen in 2 places:

  • API Server, which needs to support authentication with ServiceAccountToken.
  • KFP Client, to better support this authentication method.

/assign @yanniszark
cc @Bobgy

@Bobgy
Copy link
Contributor

Bobgy commented Feb 21, 2021

Thank you for the proposal! I'd love to see it getting it upstreamed too. It's a common request in #4440.

@elikatsis
Copy link
Member

Hello!

I'll provide an update here as I'll be pushing a PR covering the backend part very soon.

As mentioned in the first comment, we are adding a new authentication method: authentication using ServiceAccountTokens.
For this, we need the clients to put ServiceAccountTokens in requests and the backend (KFP API server) to retrieve them and authenticate the requests.

How will this ServiceAccountToken find its way in the requests?

  1. The client finds a proper ServiceAccountToken (more on this later on)
  2. It adds an Authorization: Bearer <token> header in all requests

What does the authentication cycle of the backend look like?

  1. We will extend the authentication mechanisms of the KFP API server with one more authenticator [and we will make the available authenticators extendable]
  2. Every request will pass through all available authenticators (currently, Kubeflow-UserID header and ServiceAccountToken) until one succeeds.
    Then, that is, if one succeeds, authentication succeeds.
    Otherwise, that is, if all authenticators have failed, the request is considered unauthenticated.

How does the ServiceAccountToken authenticator work?

  1. The KFP API server creates a TokenReview using the ServiceAccountToken retrieved from the requests bearer token header and some expected audience (for the KFP case, this can be ml-pipeline)
  2. Kubernetes responds (with the TokenReviewStatus) whether the token is associated with a known user and with what audience
  3. The KFP API server verifies that ml-pipeline is in the audience specified in the Kubernetes response
  4. The KFP API server considers the request authenticated and assumes the user specified by Kubernetes in its response

Useful links:

How does the client find a ServiceAccountToken to use?

Kubernetes has built-in ways to project tokens with specific audience for the ServiceAccount of a pod.
Each container of a pod mounts the token similarly to how it would mount some volume.
The kubelet generates a token and stores it in a file. Then, to retrieve the token, it's just a matter of reading this file.

The KFP client should have a seamless way to

  1. retrieve the path where the token is mounted,
  2. read it, and
  3. use it in request headers.

The token has an expiration time, however the kubelet makes sure to refresh this token before it expires.
So, finally, the client should re-read the token every now and then.

This last part is also relevant to the discussion of #4683

Useful links:

@davidspek
Copy link
Contributor

@elikatsis @yanniszark If any help is needed so this can be added for 1.3 please let me know. I think a large portion of the community has been waiting for this a while now and it would be great to have it included in 1.3.

@elikatsis
Copy link
Member

/assign @elikatsis

@elikatsis
Copy link
Member

@davidspek thanks for volunteering!
We actually have the code ready for a PR, and we've extensively tested it in our deployments.
I believe it would be really helpful if had the time to test the PRs (backend & client)!

Before I open the client PR I'll present some implementation details (we've described an overview in the comment above)

As mentioned in #4683 (comment), we want to have a generic way to provide credentials to the client. We will be using a TokenCredentials abstract class for this and we will be making use of a very interesting built-in Kubernetes Configuration functionality: auth_settings. [Obviously, we use a Configuration in our client (source).]

Requirements

  1. We want some credentials to find their way in request headers and, more specifically, in the Authorization: Bearer <token> header.
  2. Also, we need a way to refresh the token before making an API call (as mentioned in the comment above, when projecting service account tokens for pods, the kubelet refreshes them every now and then, so a client needs to read the token often)

Information about the Kubernetes Configuration object

  1. A Kubernetes Configuration, based on its attributes, it may hold some BearerToken authentication settings (source)
  2. Before making an API call it updates the request using these settings (source). Based on these settings, it may populate the request with:
    • cookies,
    • headers, or
    • queries.

[Expanding (1)] As shown in this source, by providing a Configuration.api_key["authorization"] we can add a BearerToken auth setting which:

  1. adds a header to the request (source)
  2. the header name is authorization (source)
  3. the header value is retrieved using the get_api_key_with_prefix() method (source)

[Expanding (3)] The get_api_key_with_prefix() method (source)

  1. Eventually returns self.api_key["some-key"] with a desired prefix if self.api_key_prefix["some-key"] is set
  2. Note that before running any of this, it executes the refresh_api_key_hook() method if it is defined ❗

[Expanding (2)] The refresh_api_key_hook() method runs before every request. And, as its name suggests, it's a neat way to refresh the api keys!

Conclusions

To sum up, what we need to do is:

  1. populate our config.api_key["authorization"] = token,
  2. populate our config.api_key_prefix["authorization"] = "Bearer", and
  3. provide our config.refresh_api_key_hook with a function that updated config.api_key["authorization"].

So, for this case (authentication with ServiceAccountTokens), we need to

  1. Read the contents of a specific file in the container's file system (projected service account tokens are essentially volumes mounted on pods). This is the token
  2. Use a method that reads and returns the contents of this file as the refresh_api_key_hook

Design decisions

  1. We will create a subclass of TokenReview named ServiceAccountTokenVolumeCredentials
  2. The class constructor will be expecting a path pointing to the file where the token is stored
  3. If the user doesn't provide a path, the constructor will look for an environment setting: the value of the environment variable ML_PIPELINE_SA_TOKEN_PATH
  4. If the user doesn't provide a path and the environment variable is not set, the constructor will fall back to reading the path /var/run/secrets/ml-pipeline/token
  5. The Client constructor will be expecting a credentials argument and manipulate it accordingly
  6. If no credentials are provided and the client detects it is running inside a pod, it will attempt to use a ServiceAccountTokenVolumeCredentials.

How to set up the pod to authenticate against KFP

We (Arrikto) have been using a PodDefault that configures the pod to authenticate against KFP based on the aforementioned design.
Here follows the PodDefault, it essentially describes all that we need to supplement the pod definition with:

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-ml-pipeline
spec:
  desc: Allow access to Kubeflow Pipelines
  selector:
    matchLabels:
      access-ml-pipeline: "true"
  volumeMounts:
  - mountPath: /var/run/secrets/ml-pipeline
    name: volume-ml-pipeline-token
    readOnly: true
  volumes:
  - name: volume-ml-pipeline-token
    projected:
      sources:
      - serviceAccountToken:
          path: token
          expirationSeconds: 7200
          audience: ml-pipeline
  env:
  - name: ML_PIPELINE_SA_TOKEN_PATH
    value: /var/run/secrets/ml-pipeline/token  # this is dependent on the volume mount path and SAT path

@davidspek
Copy link
Contributor

@elikatsis Thanks for the detailed post. I will look at it more closely tomorrow and do my best to help test the PRs.

elikatsis added a commit to arrikto/kubeflow-pipelines that referenced this issue Mar 11, 2021
Part of kubeflow#5138

This is a subclass of TokenCredentials and implements the logic of
retrieving a service account token provided via a ProjectedVolume.

The 'get_token()' method reads and doesn't store the token as the
kubelet is refreshing it quite often.

Relevant docs:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
@Bobgy
Copy link
Contributor

Bobgy commented Mar 12, 2021

Hi @elikatsis! Thank you for the detailed design and PRs!
I think these are absolutely great work and I'll start looking at them right now.

However, despite that, I'm a little concerned that the design was only made public 5 days before Kubeflow 1.3 feature cut date -- March 15th. I think we agreed early on the rough direction, that was a good heads up, but it's not possible to discuss this fairly complex feature design thoroughly within 5 days. If we commit to shipping this in KF 1.3, we can only rush to a decision.

Besides that an important dependency (important in the terms of making zero-config default better experience) on PodDefault was only revealed 3 days before the feature cut date, which I especially worry about.

@elikatsis
Copy link
Member

@Bobgy thanks for putting time on this!

I'm a little concerned that the design was only made public 5 days before Kubeflow 1.3

Your concerns are totally valid and understandable. We agree it is very close to the first RC and this may be a bit pressing.

I think we agreed early on the rough direction, that was a good heads up, but it's not possible to discuss this fairly complex feature design thoroughly within 5 days

Indeed, this is an advanced feature. However, most of the changes we had already discussed due to the joint talk you had with @yanniszark.

That's why we expect the backend changes to be unsurprising.
As far as the client is concerned, the change is relatively small and fully backwards compatible. In fact, it doesn't affect existing users at all.

Note that all of the changes are extensions to existing functionality and are not removing or changing any old behavior.

Besides that an important dependency (important in the terms of making zero-config default better experience) on PodDefault was only revealed 3 days before the feature cut date, which I especially worry about.

We agree, but it's not necessary to have a zero-config issue before the RC. We can still use the alternative of some config, if we want.

To sum up: yes, we are very close to the RC (also take into consideration that cutting a release was pushed one week), but let's do our best and see if we can make it! Many users rely on it. If we don't make it, it's ok!

@Bobgy
Copy link
Contributor

Bobgy commented Mar 15, 2021

@elikatsis thanks I just realized the RC cut delay, I'm glad we get some more breath on this feature.

We agree, but it's not necessary to have a zero-config issue before the RC. We can still use the alternative of some config, if we want.

Makes sense, so I'd frame the discussion around common things we agree on, would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC? (For clarification, I don't mean to ask you to split right now, but rather during review if we see parts that everyone agrees on, we can split them out for a quick merge.)

and I've got very good context on the backend part based on previous discussion with Yannis, I think we can get them merged.

The only part I have concerns is the user facing interface to add service account tokens. What do you think about letting KFP api server inject projected service account token to every KFP pod? I don't think that raises more security risk (because service account tokens are already available there), nor is there chance to break existing components. Pros -- we do not need PodDefault there, so one less dependency.

e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

elikatsis added a commit to arrikto/kubeflow-pipelines that referenced this issue Mar 17, 2021
Extend the authenticators which the KFP apiserver applies on a request
with a TokenReview authenticator.

This authenticator expects a ServiceAccountToken in a header with the
format: 'Authorization: Bearer <token>'

Part of kubeflow#5138
@Bobgy
Copy link
Contributor

Bobgy commented Mar 18, 2021

For clarification, I'm prioritizing reviewing the backend PR, because it's a blocker of release. The SDK PR can be released after Kubeflow release, because users can easily upgrade the SDK at any time, and there's very little coupling to the server.

elikatsis added a commit to arrikto/kubeflow-pipelines that referenced this issue Mar 18, 2021
Extend the authenticators which the KFP apiserver applies on a request
with a TokenReview authenticator.

This authenticator expects a ServiceAccountToken in a header with the
format: 'Authorization: Bearer <token>'

Part of kubeflow#5138
google-oss-robot pushed a commit that referenced this issue Mar 19, 2021
… of #5138 (#5286)

* Introduce kubernetes client utils

Introduce common utils for client initialization to factor out common
code.
This is a step towards fulfilling #4738.

* Use common util to initialize k8s clientset

* Introduce TokenReview client and fake ones

* Extend ResourceManager with a TokenReview client

* Extend FakeClientManager with a fake TokenReview Client

* Introduce authentication utils

* Introduce HTTP header authenticator

* Initialize Kubeflow-UserID header authenticator

* Refactor getUserIdentity() to use auth_util

* Move getting user identity logic to resource manager

Have the resource manager authenticate the request.
In following commits we will be extending the authentication methods to
use, among others, Kubernetes clients. Thus, we move the logic to the
resource manager to benefit from the clients kept its context.

* Introduce constants for the TokenReview authentication

* Introduce TokenReview authenticator

* Extend authenticators with a TokenReview one

Extend the authenticators which the KFP apiserver applies on a request
with a TokenReview authenticator.

This authenticator expects a ServiceAccountToken in a header with the
format: 'Authorization: Bearer <token>'

Part of #5138

* Add tests for auth_util

* Add tests for HTTPHeaderAuthenticator

* Update server tests based on the new authentication API

* Remove old tests and unused code

* Add tests for TokenReviewAuthenticator

* Add server tests with unauthenticated requests

* manifests: Allow KFP API server to create TokenReviews

* auth: Split 'auth_util.go' into two parts

Split the file into:
* auth.go: contains the main entrance from the outside of the package
* util.go: contains all utility functions used inside

* Change token review audience variable and value

* Allow configuring audience with an environment variable

* Rename IsRequestAuthenticated -> AuthenticateRequest

* Don't use AuthenticateRequest method in tests

Instead of using AuthenticateRequest to retrieve the user from the
request and then use it for the expected values, allocate a variable for
the username in the request and use that in the expected values.
This ensures we don't hide potential errors of AuthenticateRequest.

* Change authenticators order

Have the HTTPHeaderAuthenticator first followed by the
TokenReviewAuthenticator

* Move authenticators to a ResourceManager property

To avoid potential race conditions when initializing the Authenticators
variable, we move authenticators to a ResourceManager property and
initialize it along with the initialization of the manager.
@elikatsis
Copy link
Member

I had totally missed these comments 😱

would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC?

We've merged the backend now, I hope you are good with this and did not hesitate asking me to split some commits. Next time feel free to explicitly ask for things like that during the review!

What do you think about letting KFP api server inject projected service account token to every KFP pod?
e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

These sound like very good ideas. However, maybe we want an explicit way to declare something like "allow this pod to have access to KFP, but not this one".

We will iterate on these ideas internally and come back to it!

@davidspek
Copy link
Contributor

@elikatsis Does it make sense to integrate the PodDefault you shared above with the notebook controller to make the user experience more seamless? I believe this would be the best way to solve this long standing issue.

@Bobgy
Copy link
Contributor

Bobgy commented Mar 21, 2021

I had totally missed these comments 😱

would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC?

We've merged the backend now, I hope you are good with this and did not hesitate asking me to split some commits. Next time feel free to explicitly ask for things like that during the review!

No worries, the backend PR LGTM. I was mostly talking about concerns for the sdk PR.

What do you think about letting KFP api server inject projected service account token to every KFP pod?
e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

These sound like very good ideas. However, maybe we want an explicit way to declare something like "allow this pod to have access to KFP, but not this one".

We will iterate on these ideas internally and come back to it!

I'd prefer adhering to the standard RBAC model. Each Pod has access to a service account, while we add RBAC rules to control what one service account can do. I worry the addition of choosing which pods can have access to KFP api is introducing an unnecessary abstract layer.

@Bobgy
Copy link
Contributor

Bobgy commented Mar 30, 2021

@elikatsis @yanniszark I just recall a separate concern with current implementations.

When we support using service account token to authenticate in KFP api server, we need to configure Istio authorization rules to allow all types of access. However, that seems to break the security assumption when using HTTP user id header to authenticate -- only requests from istio ingress gateway are allowed to access KFP api server.

How do you overcome this problem?

Per previous discussion in #4440 (comment), we should implement ability to parse the special Istio header X-Forwarded-Client-Cert, so that KFP API server can know which requests come from Istio ingress gateway. and this part is currently missing in the implementations

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 9, 2021

@juliusvonkohout Could you please share example how you're doing it? Specifically, what namespace you're adding in your PodDefault

First #6537 should be merged. Then you could easily alter the two following kustomization patch files. I use them currently to add stuff for istio-cni on Openshift, but you can easily add a poddefault too. This is the official way that is used to prepare a user namespace in kubeflow. Metacontroller monitors the namespace and creates the resources gathered from pipelines-profile-controller sync.py. E.g. it adds the serviceaccounts and visualization server to each namespace and in the future it should also add a poddefault.

composite-controller-patch.txt
sync.py.txt

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 28, 2021

@elikatsis @Bobgy @elikatsis

Please merge
#6629
and
kubeflow/kubeflow#6160

and you will have automatic authentication by default

@bartgras feel free to test it yourself

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 30, 2021

@juliusvonkohout I have made some comments on your PRs:

* [feat: add poddefault for automatic Jupyterlab authentication #6629 (comment)](https://github.com/kubeflow/pipelines/pull/6629#issuecomment-930642835)

* [feat: Authenticate to kubeflow pipelines by default kubeflow#6160 (comment)](https://github.com/kubeflow/kubeflow/pull/6160#issuecomment-930645832)

Sorry for the delay. There are three options as commented by me in the specific issues.

  1. Continue with pipeline-profile-controller for Kubeflow 1.5 and accept my pull requests. Almost anyone will use pipelines and even if they do not that change will not harm them.

  2. You remove pipeline-profile-controller and rely on "ProfileResourceTemplate". This would have the great benefit of getting rid of the metacontroller with cluster-admin permissions and is my long term favorite. I think only pipeline-profile-controller is still using metacontroller (compositecontroller kubeflow-pipelines-profile-controller)

  3. You now accept my non-harmful changes in both repositories, help users directly and see if you can make the transition to ProfileResourceTemplate and the complete removal of metacontroller in time for Kubeflow 1.5. Either way the feature is then in 1.5

@juliusvonkohout
Copy link
Member

@thesuperzapper which of the three options above sound sensible to you? Or do you have another Idea?

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
@juliusvonkohout
Copy link
Member

stalebot this is still relevant

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 3, 2022
@rustam-ashurov-mcx
Copy link

rustam-ashurov-mcx commented May 23, 2022

Just installed KubeFlow on AWS EKS few days ago. Used aws kfl manifests from:
https://github.com/awslabs/kubeflow-manifests
If I understand it correctly, installing it I was using 1.4.1 version of this manifests definition under the hood:
https://github.com/kubeflow/manifests

All in all it works, so I created a new jupiter notebooks server.
I'm using default user and have the issue with no access to Pipeline service via:

import kfp
client = kfp.Client()
print(client.list_experiments())

Error message:

ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'date': 'Mon, 23 May 2022 08:36:33 GMT', 'x-envoy-upstream-service-time': '15', 'server': 'envoy', 'transfer-encoding': 'chunked'})
HTTP response body: {"error":"Internal error: Unauthenticated: Request header error: there is no user identity header.: Request header error: there is no user identity header.\nFailed to authorize with API resource references\ngithub.com/kubeflow/pipelines/backend/src/common/util.Wrap\n\t/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:275\ngithub.com/kubeflow/pipelines/backend/src/apiserver/server.(*ExperimentServer)

I also tried:

import kfp
client = kfp.Client(host='http://ml-pipeline-ui.kubeflow:80')
print(client.list_experiments())

Error:

ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Mon, 23 May 2022 08:40:03 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '26'})
HTTP response body: RBAC: access denied

Also tried:

import kfp
client = kfp.Client(host='http://ml-pipeline-ui:80')
print(client.list_experiments())

Error message in this case:

[MaxRetryError: HTTPConnectionPool(host='ml-pipeline-ui', port=80): Max retries exceeded with url: /apis/v1beta1/experiments?
page_token=&page_size=10&sort_by=&resource_reference_key.type=NAMESPACE&resource_reference_key.id
=kubeflow-user-example-com (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f67a65cbe50>: Failed to establish a new connection
: [Errno -2] Name or service not known'))](http://ml-pipeline-ui:80'</span%3E%3Cspan)

When I was creating the jupiter server I checked config to give server access to pipelines, and in notebooks this code:

with open("/var/run/secrets/kubeflow/pipelines/token", "r") as f:
    print(f.read())

prints token data

Any ideas what I can do to reach pipelines service from notebooks?

@rustam-ashurov-mcx
Copy link

After kfp lib update it works fine, exact code that works for me is the variant without anything really special:

client = kfp.Client()
print(client.list_experiments())

So all in all for me it was:

@tomaszstachera
Copy link

Hi, do I understand correctly that solution discussed in this issue solves only authentication inside the cluster and using only K8s service account? Is there any plan or issue related to authentication using cloud-specific service account/service principal/etc (in my case GCP)? Current solution with only interactive and only user-based authentication is highly not sufficient (https://www.kubeflow.org/docs/distributions/gke/pipelines/authentication-sdk/).
PS: I guest this is not working with GCP accounts integrated with Active Directory - https://www.kubeflow.org/docs/components/pipelines/v1/sdk/connect-api/#full-kubeflow-subfrom-outside-clustersub?

@juliusvonkohout
Copy link
Member

@tomaszstachera check out the new oidc-authservice

@chensun
Copy link
Member

chensun commented Mar 6, 2023

/cc @chensun

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 23, 2024
@juliusvonkohout
Copy link
Member

I think this is solved with our new oauth2-proxy and token based authentication. There is a small new bug kubeflow/manifests#2747 but that will be fixed for 1.9. So we support heterogeneous ways of authenticating.

/close

Copy link

@juliusvonkohout: Closing this issue.

In response to this:

I think this is solved with our new oauth2-proxy and token based authentication. There is a small new bug kubeflow/manifests#2747 but that will be fixed for 1.9. So we support heterogeneous ways of authenticating.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
Status: Closed
Development

No branches or pull requests