Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Improved User Isolation in Kubeflow Pipelines #8406

Open
DomFleischmann opened this issue Nov 1, 2022 · 20 comments
Open

[feature] Improved User Isolation in Kubeflow Pipelines #8406

DomFleischmann opened this issue Nov 1, 2022 · 20 comments

Comments

@DomFleischmann
Copy link

Feature Area

/area frontend
/area backend
/area sdk

What feature would you like to see?

Authenticated and Authorized Users should be isolated by namespaces and should not have access to other users artifacts, unless authorized. The solution should be handled in frontend, backend, object storage and sdk.

What is the use case or pain point?

The current implementation allows users to access other users artifacts, this is a big security risk and a feature that limits enterprise adoption.

Is there a workaround currently?

Distributions are doing their own workarounds or enterprise customers need to deploy separate clusters for different users, which is unefficient.

This is a Roadmap Item for Kubeflow 1.7 requested by the 1.7 Release Team.

@zijianjoy @juliusvonkohout @StefanoFioravanzo @jbottum @annajung @kimwnasptd

Love this idea? Give it a 👍.

@jbottum
Copy link

jbottum commented Nov 1, 2022

/priority p1

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Nov 2, 2022

I think the there are three main tasks.
From here kubeflow/kubeflow#6662 the listed main problems are

  1. "The pipeline UI allows reading other peoples artifacts. The artifact proxy in the user namespace is insecure and obsolete. In the UI you can just get the artifact link from another user, remove the ?namespace=xxx parameter at the end and the UI server will fake the corresponding user for you. So if you know the S3/GCS path you can read other guys artifacts."

  2. feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725 (comment)

  3. The namespaced pipeline definitions will be implemented by Arrikto.

@StefanoFioravanzo
Copy link
Member

Thanks for starting this issue! Looping in @elikatsis from our side as well

@subasathees
Copy link

@StefanoFioravanzo , @juliusvonkohout @DomFleischmann
Hi Team, Any update on this feature, in kf v1.7.0 also we can see this is not implemented. any workaround for the same available now.

@thesuperzapper
Copy link
Member

@subasathees artifacts are correctly isolated when using Kubeflow Pipelines on deployKF which is my new Kubeflow distribution that includes Kubeflow Pipelines.

deployKF achieves this isolation by using object prefixes with profile/namespace at the beginning, and assigning a unique IAM role for each profile.

There is also some crazy stuff going on to ensure the isolation of KFP V2 artifacts, but it all boils down to creating the ConfigMap/kfp-launcher in each profile namespace so that the defaultPipelineRoot is set to a different value for each profile.

However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/ object prefix (regardless of the profile/namespace).


Interestingly, the ?namespace= bypass described in #8406 (comment) does not work in deployKF because of a few factors:

  • there is a very advanced feature that automatically redirects the ?namespace= parameter to account for the case of a cached result being in a different namespace (and this happens to have the side effect of always forcing a the ?namespace= parameter to be set)
  • even when the redirect future is disabled (at least for a minio object store, not s3), the ml-pipeline-ui pod from the kubeflow namespace actually has a bug that prevents it from accessing the minio service (because in deployKF minio lives in a different namespace to kubeflow)

@thesuperzapper
Copy link
Member

@zijianjoy @james-jwu we really need to fix the ?namespace= parameter bypass described in #8406 (comment).

The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the ml-pipeline-ui pod from the kubeflow namespace, rather than proxying to ml-pipeline-artifact in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).

I think the best option is to have the ml-pipeline-ui (KFP frontend pod), reject artifact requests that don't specify ?namespace=.

To do this, we would need to update this code to reject when no namespace parameter is found:

export function getArtifactsProxyHandler({
enabled,
namespacedServiceGetter,
}: {
enabled: boolean;
namespacedServiceGetter: NamespacedServiceGetter;
}): Handler {
if (!enabled) {
return (req, res, next) => next();
}
return proxy(
(_pathname, req) => {
// only proxy requests with namespace query parameter
return !!getNamespaceFromUrl(req.url || '');
},
{
changeOrigin: true,
onProxyReq: proxyReq => {
console.log('Proxied artifact request: ', proxyReq.path);
},
pathRewrite: (pathStr, req) => {
const url = new URL(pathStr || '', DUMMY_BASE_PATH);
url.searchParams.delete(QUERIES.NAMESPACE);
return url.pathname + url.search;
},
router: req => {
const namespace = getNamespaceFromUrl(req.url || '');
if (!namespace) {
throw new Error(`namespace query param expected in ${req.url}.`);
}
return namespacedServiceGetter(namespace);
},
target: '/artifacts',
headers: HACK_FIX_HPM_PARTIAL_RESPONSE_HEADERS,
},
);
}

@juliusvonkohout
Copy link
Member

@zijianjoy @james-jwu we really need to fix the ?namespace= parameter bypass described in #8406 (comment).

The bypass is that artifact auth is ignored when no namespace parameter is set. This is because when no namespace parameter is set, it uses the ml-pipeline-ui pod from the kubeflow namespace, rather than proxying to ml-pipeline-artifact in the profile namespaces (to which istio will control access based on the user-id header, with the AuthorizationPolicy).

I think the best option is to have the ml-pipeline-ui (KFP frontend pod), reject artifact requests that don't specify ?namespace=.

To do this, we would need to update this code to reject when no namespace parameter is found:

export function getArtifactsProxyHandler({
enabled,
namespacedServiceGetter,
}: {
enabled: boolean;
namespacedServiceGetter: NamespacedServiceGetter;
}): Handler {
if (!enabled) {
return (req, res, next) => next();
}
return proxy(
(_pathname, req) => {
// only proxy requests with namespace query parameter
return !!getNamespaceFromUrl(req.url || '');
},
{
changeOrigin: true,
onProxyReq: proxyReq => {
console.log('Proxied artifact request: ', proxyReq.path);
},
pathRewrite: (pathStr, req) => {
const url = new URL(pathStr || '', DUMMY_BASE_PATH);
url.searchParams.delete(QUERIES.NAMESPACE);
return url.pathname + url.search;
},
router: req => {
const namespace = getNamespaceFromUrl(req.url || '');
if (!namespace) {
throw new Error(`namespace query param expected in ${req.url}.`);
}
return namespacedServiceGetter(namespace);
},
target: '/artifacts',
headers: HACK_FIX_HPM_PARTIAL_RESPONSE_HEADERS,
},
);
}

Can you create a PR?

@juliusvonkohout
Copy link
Member

@thesuperzapper
"However, deployKF is still limited by Kubeflow Pipelines putting all pipeline definitions under the pipelines/ object prefix (regardless of the profile/namespace)."
#7725 also fixes the /pipelines minio access. Users should anyway not access that path. Although the PR is outdated and might still have too much permissions to make the minio UI work more user friendly. But one can easily fix that.

@subasathees The namespaced pipeline definitions should be in 1.8 including the UI part. They are partially in 1.7.

All of this must be upstream. Having partial workarounds in downstream distributions is not a solution.

@subasathees
Copy link

@juliusvonkohout @thesuperzapper , Thanks for your detailed information, this will help.

@thesuperzapper
Copy link
Member

@juliusvonkohout I am pretty focused on deployKF right now, so don't have much time.

The change to reject artifact requests without ?namespace should be relatively straightforward, but it could break stuff so we need to test carefully.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 20, 2023

@thesuperzapper we can also get rid of the per namespace artifact proxy and visualization server when doing this change. This would allow us to have zero overhead user namespaces. We just enforce ?namespace and use the already implemented direct way of ml-pipeline-ui to fetch artifacts from minio. Removing ?namespace from your query just uses that direct path by the way.

Copy link

github-actions bot commented Dec 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 1, 2023
@juliusvonkohout
Copy link
Member

Definitely not stale

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 1, 2023
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 1, 2024
@juliusvonkohout
Copy link
Member

/hold

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 1, 2024
Copy link

github-actions bot commented May 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 1, 2024
@juliusvonkohout
Copy link
Member

Not stale

@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 2, 2024
@juliusvonkohout
Copy link
Member

@zijianjoy @rimolive can you freeze the lifecycle of the Issue? It is still relevant.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 21, 2024
@juliusvonkohout
Copy link
Member

/lifecycle frozen

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants