Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Support separate artifact repository for each namespace #7219

Closed
wants to merge 3 commits into from

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented Jan 26, 2022

Description of your changes:

This fixes #4649

The only limitation is that metrics are not rendered by the UI for whatever reason as shown in #4649 (comment). But MinIO provides them successfully according to the logs and you can download them from the web interface.

@Bobgy @zijianjoy I mainly used the black formatter. If you want a different python style lease tell me.

Checklist:

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign zijianjoy after the PR has been reviewed.
You can assign the PR to them by writing /assign @zijianjoy in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 26, 2022

i also need to use the local instead of the kubeflow namespace to generate the minio secret. So it must happen in

    class Controller(BaseHTTPRequestHandler):
        def sync(self, parent, children):
            # parent is a namespace
            namespace = parent.get("metadata", {}).get("name")

Please tell me if i am allowed to move the minio secret settings into that function?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 26, 2022

@jacobmalmberg maybe it fails here

const localVarPath = `/apis/v1beta1/runs/{run_id}:reportMetrics`.replace(

These files are also interesing https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/tabs/MetricsTab.tsx https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/viewers/MetricsVisualizations.tsx

Maybe the persistenceagent

func (r MetricsReporter) readNodeMetricsJSONOrEmpty(runID string, nodeStatus workflowapi.NodeStatus) (string, error) {
does not use the ml-pipeline-ui-artifact proxy in the users namespace.

@jacobmalmberg
Copy link
Contributor

@juliusvonkohout Entirely possible! I probably won't have time to dig in deeper for the foreseeable future, though.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 27, 2022

@Bobgy gy @jacobmalmberg @ca-scribner
There is a really strange design decision in kubeflow which will prevent this pull request from working properly. ml-pieline-ui uses (#3554) the artifact proxy in the users namespace ml-pipeline-ui-artifact. ml-pipeline-ui-artifact uses the proper local minio instance. ml-pipeline(apiserver) does not use that proxy (

func (c *MinioClient) GetObject(bucketName, objectName string, opts minio.GetObjectOptions) (io.Reader, error) {
) and fails to fetch any artifacts because they are not in the global minio instance. This results in no mplpipeline-metrics artifact being found by the ml-pipeline-persistenceagent when it asks mp-pipeline(apiserver) for it. Therefore the persistenceagent doe snot report the metrics into the mysql database mlpipeline in the table run_metrics. And of course the API server becomes useless for artifacts in general. A proper design would be if the apiserver would use the artifact proxy and everyone else, especially ml-pipeline-ui would use the apiserver only instead of doing stuff manually.

This means one either has to repair this structural deficit or use a global scalable minio instance with multiple buckets and different passwords per bucket. I will test whether the apiserver can get artifacts from different buckets per user.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 28, 2022

@Bobgy @ca-scribner @jacobmalmberg this is my design proposal:

We keep the single Minio instance and use bucket name = profile name = namespace name because of the apiserver limitations explained in #7219 (comment).

Detailed explanation:

func (w *Workflow) FindObjectStoreArtifactKeyOrEmpty(nodeID string, artifactName string) string {
searches for the artifact key "mlpipeline-metrics" in the workflow manifest.
This value e.g. 'artifacts/bash-pipeline-cqkb6/2022/01/27/bash-pipeline-cqkb6-310116140/mlpipeline-ui-metadata.tgz' is returned here #
artifactPath := workflow.FindObjectStoreArtifactKeyOrEmpty(nodeID, artifactName)

So there is no bucket included, just the postfix
func (m *MinioObjectStore) GetFile(filePath string) ([]byte, error) {

It is set by default to the environment variable OBJECTSTORECONFIG_BUCKETNAME ("mlpipeline")
bucketName := common.GetStringConfigWithDefault("ObjectStoreConfig.BucketName", os.Getenv(pipelineBucketName))
return storage.NewMinioObjectStore(&storage.MinioClient{Client: minioClient}, bucketName, pipelinePath, disableMultipart)

So we have to change it somehow to use the name of the users profile/namespace (not the user which might contain forbidden letters like @) of the request as bucketname. This elegant solution does not need changes in the python SDK, we can keep the artifact keys without bucket names in the workflow manifest.

apiVersion: kubeflow.org/v1
kind: Profile
metadata:
  name: julius-vonkohout # this will be the namespace and bucket name
  resourceVersion: '463979994'
  uid: 372f46ea-ea7f-43dc-8cf6-c562a461ea86
spec:
  owner:
    kind: User
    name: julius.vonkohout@....com
  resourceQuotaSpec: {}

This apiserver log shows that the username and profile/namespace name is there for most requests. But of course not for reading artifacts... this is the most important API call that lacks the namespace specification...
https://www.kubeflow.org/docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/

I0127 15:27:00.857834       7 util.go:313] Getting user identity...
I0127 15:27:00.857858       7 util.go:323] User: julius.vonkohout@....com, ResourceAttributes: &ResourceAttributes{Namespace:julius-vonkohout,Verb:get,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:experiments,Subresource:,Name:default,}
I0127 15:27:00.857876       7 util.go:324] Authorizing request...
I0127 15:27:00.861652       7 util.go:331] Authorized user 'julius.vonkohout@....com': &ResourceAttributes{Namespace:julius-vonkohout,Verb:get,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:experiments,Subresource:,Name:default,}
I0127 15:27:00.864767       7 interceptor.go:37] /api.ExperimentService/GetExperiment handler finished
I0127 15:29:11.847320       7 interceptor.go:29] /api.RunService/ReadArtifact handler starting
I0127 15:29:11.856556       7 interceptor.go:37] /api.RunService/ReadArtifact handler finished
I0127 15:29:13.092830       7 interceptor.go:29] /api.RunService/ReadArtifact handler starting
I0127 15:29:13.100022       7 interceptor.go:37] /api.RunService/ReadArtifact handler finished
I0127 15:29:14.410155       7 interceptor.go:29] /api.RunService/ReadArtifact handler starting
I0127 15:29:14.419499       7 interceptor.go:37] /api.RunService/ReadArtifact handler finished
I0127 15:29:15.713893       7 interceptor.go:29] /api.RunService/ReadArtifact handler starting
I0127 15:29:15.721097       7 interceptor.go:37] /api.RunService/ReadArtifact handler finished

We really have to extend the API server artifacts API with the resource_reference_key.type=NAMESPACE and resource_reference_key.id=pavol-bauer. Somehow all other APIs do that properly, so it should be copy and paste. This is because we can share namespaces and e.g. Julius could ask for artifacts from namespace pavol-bauer (if it is shared) and julius-vonkohout. Trying to guess the right namespace/bucket then would be a nightmare.

So we just use the namespace ResourceAttribute from the request as bucket name as the other APIs do. If there is no namespace in the request we just use the default bucket from the environment variable (mlpipeline) for backwards compatibility or "shared" pipeline results. This is handled the same way for namespaced pipeline definitions (YAML files) in the API server. We need to slightly modify the persistenceagent too then. It must send the namespace of the workflow in the request to the API server when it wants to check for "mlpipeline-metrics" artifacts to report them later on as run_metrics. At some point in the future also those run metrics should be namespaced the same way we handle namespaced pipeline definitions.

At this point we still have to create the buckets, passwords and bucket policies manually.

We can merge the kubeflow-pipelines-profile-controller code in this pull request into https://github.com/kubeflow/kubeflow/tree/master/components/profile-controller to get rid of metacontroller with cluster-admin rights and two separate controllers as wanted by @thesuperzapper in #6629 (comment) (the profileresourcetemplate will make it way more simple in the long term). Of course we will ditch the local PVC and MinIO instance for the global one in the kubeflow namespace, but we will keep the namespace specific passwords and change the bucket name. The ml-pipeline-ui will continue to work properly anyway because it uses the artifact proxy in the users namespace. Hopefully we can also get rid of that in the future...

We just need to include a routine that creates an S3 bucket, user, policy, quota and password on profile creation and removes it again on profile deletion. The password generation can be obtained from the kubeflow-pipelines-profile-controller code in this pull request.

I examined the design proposal with a security advisor. If Alice shares a namespace with bob and then unshares it bob might have extracted and saved the minio, default-editor-token-xxxxx, default-viewer-token-xxxxx and other secrets. Bob could use it to still access alices minio bucket or the whole namespace with the serviceaccount. So ALL automatically (non-user) created secrets have to be deleted on rolebinding changes. Furthermore the minio password must not be deterministic.

@ca-scribner
Copy link
Contributor

I've only thought about the proposal for a few minutes, but I like it so far. Would be good to have input from someone in the pipelines working group, but this sounds like a good idea to me.

@juliusvonkohout
Copy link
Member Author

I've only thought about the proposal for a few minutes, but I like it so far. Would be good to have input from someone in the pipelines working group, but this sounds like a good idea to me.

@kimwnasptd @thesuperzapper @davidspek
I have found another severe security issue in kubeflow... If alice shares the namespace with bob, bob can start a jupyterlab and extract the serviceaccounttoken. If alice unshares the namespace bob can just use the extracted token to still do what he wants in alices namespace. He can even do so from outside of the cluster. So we also have to delete all default-editor-tokens on a rolebinding change. They will be automatically recreated by the reconciler.

I will implement this too for this PR.

grafik

@juliusvonkohout
Copy link
Member Author

this is now handled in #7406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Multi User] Support separate artifact repository for each namespace
3 participants