-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not mount volume in Container op #477
Comments
@gaoning777 Using the following code: import kfp.dsl as dsl
@dsl.pipeline(
name='DataLoading Pipeline',
description='Test.'
)
def phenology_pipeline():
data_loading_task = data_loading_op(data_path)
.add_volume(k8s_client.V1Volume(name='data-processing')) \
.add_volume_mount(k8s_client.V1VolumeMount(
mount_path='/data_processing',
name='data-processing')) Worked by mounting a folder My PersistentVolume details: $> kubectl describe pv data-processing
Name: data-processing
Labels: <none>
Annotations: pv.kubernetes.io/bound-by-controller=yes
Finalizers: [kubernetes.io/pv-protection]
StorageClass: manual
Status: Bound
Claim: kubeflow/data-processing-claim
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 5Gi
Node Affinity: <none>
Message:
Source:
Type: HostPath (bare host directory volume)
Path: /data_processing
HostPathType:
Events: <none> |
Have you try add the hostPath to the volume definition in DSL?
|
@IronPan Thanks, that did the trick! Anyway, are there any plans to better support volume mounts in Container Ops, maybe in some declarative way? |
+1! For context, here is the work we have contributed to in Kubeflow, initially for notebooks: @IronPan We -- Arrikto -- would be more than willing to contribute effort in getting Kubeflow Pipelines to use the native K8s storage resources seamlessly. And in general, we would definitely like to contribute more in getting Pipelines to be able to expose the characteristics of the underlying native K8s resources [e.g., pod spec] easily. |
/assign ark-kun |
This make sense, especially for one who are familiar with K8s yaml paradigm. @Ark-kun I think this aligns with your idea. Any thoughts?
The container op supports mounting volumes so it should support PVC already.
Pipeline uses Argo as underlying orchestrator, and not every pod spec make sense for the orchestrator so I think we should be a bit cautious what pod API fields to expose. I would like to treat this case by case. Do you have specific features want to be supported? |
Can you elaborate more? We're already using the full native K8s types to specify the volumes and volumeMounts. K8s has pretty extensive volume API. Do you want us to make some subset of the K8s APIs to be easier to specify instead of writing k8s-style spec? In any case, we're working on both expanding our support for k8s APIs and adding ways to simplify the pipeline author's job. See |
We're working on some improvements here, but we're currently limited by the parts of K8s API that Argo supports. While Argo supports the full k8s
Which of those do you need to your pipelines? What would you like API to look? |
@Ark-kun You are right, the support for k8s native types to specify Volumes and VolumesMounts is there. My comment pointed to some easier way to specify the volume mounts and the @IronPan Now that I managed to run a custom pipeline using local volume mounts, I would like to read also form a gcp bucket (I am still running on Minikube). As far as I understood, Kubeflow creates its own service account when deployed on a gcp cluster. Missing that, I would need to create my own service account and then create a k8s secret. But then how should I mount the secret in the container? |
@Ark-kun Why not let pipelines just orchestrate K8s objects so users have full access to the K8s APIs rather than trying to figure out which fields to expose? |
Argo does not allow full access to K8s APIs. For instance, Argo has support for Container spec, but not the Pod or Job spec. Should we expose Argo-specific functionality which differs from K8s model? |
Also some features that make sense for raw k8s network services are not used when running finite tasks while preventing the pipeline system from adding value to the user. Example of features that can be added by the pipeline system that K8s lacks: caching, reproducibility, data provenance, security. E.g. If we allow any Pod to have privilege access, we cannot guarantee security or data consistency. Same if we allow Pods to freely modify the Pipelines system databases. Nevertheless, we strive to have maximum parity with the K8s API as allowed by Argo. |
thanks |
…ow#477) Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
[Running Pipelines on Minikube]
I am trying to mount a folder from the Minikube VM into the containers of my pipeline.
I have a
data_processing
folder in the Minikube VM that I want to be accessible by the Pipelines containers:$> minikube ssh ls /data_processing data.csv
I created a PersistentVolume using:
I tried to test the mount in a Pipelines Lightweight component:
But the folder can not be found. Any ideas on how to solve this?
The text was updated successfully, but these errors were encountered: