Can not mount volume in Container op #477

StefanoFioravanzo · 2018-12-05T16:10:02Z

[Running Pipelines on Minikube]
I am trying to mount a folder from the Minikube VM into the containers of my pipeline.

I have a data_processing folder in the Minikube VM that I want to be accessible by the Pipelines containers:

$> minikube ssh ls /data_processing
data.csv

I created a PersistentVolume using:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: data-processing
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 5Gi
  hostPath:
    path: /data_processing

I tried to test the mount in a Pipelines Lightweight component:

def data_loader():
    import pandas as pd
    data = pd.read_csv('/data_processing/data.csv', sep=',',header=None)
    print(data)

data_loading_op = comp.func_to_container_op(data_loader, base_image='tensorflow/tensorflow:1.11.0-py3')

import kfp.dsl as dsl
@dsl.pipeline(
   name='DataLoading Pipeline',
   description='Test.'
)
def phenology_pipeline():
    data_loading_task = data_loading_op(data_path).add_volume(k8s_client.V1VolumeMount(
          mount_path='/data_processing',
          name='data-processing'))

# ...

But the folder can not be found. Any ideas on how to solve this?

The text was updated successfully, but these errors were encountered:

gaoning777 · 2018-12-05T17:44:06Z

@IronPan
To mount a volume, one needs to call both add_volume and add_volume_mount. Refer to here for an example.
add_volume specifies the volume(for example type as kubernetes secrets) and add_volume_mount specifies the mounting path, etc.

StefanoFioravanzo · 2018-12-05T18:52:43Z

@gaoning777 Using the following code:

import kfp.dsl as dsl
@dsl.pipeline(
   name='DataLoading Pipeline',
   description='Test.'
)
def phenology_pipeline():
    data_loading_task = data_loading_op(data_path)
                            .add_volume(k8s_client.V1Volume(name='data-processing')) \
                            .add_volume_mount(k8s_client.V1VolumeMount(
                                          mount_path='/data_processing',
                                          name='data-processing'))

Worked by mounting a folder /data_processing in the containers. But the folder is empty (I should expect it to contain the data.csv file). I guess it is mounting an empty volume regardless of my PersistentVolume? Or am I missing something to map the PersistentVolume to the real /data_processing folder in the minikube VM?

My PersistentVolume details:

$> kubectl describe pv data-processing
Name:            data-processing
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    manual
Status:          Bound
Claim:           kubeflow/data-processing-claim
Reclaim Policy:  Retain
Access Modes:    RWO
Capacity:        5Gi
Node Affinity:   <none>
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /data_processing
    HostPathType:
Events:            <none>

IronPan · 2018-12-05T19:10:19Z

Have you try add the hostPath to the volume definition in DSL?

.add_volume(k8s_client.V1Volume(name='data-processing', host_path=k8s_client.V1HostPathVolumeSource(path='/data_processing')))

StefanoFioravanzo · 2018-12-05T19:30:17Z

@IronPan Thanks, that did the trick!

Anyway, are there any plans to better support volume mounts in Container Ops, maybe in some declarative way?

vkoukis · 2018-12-05T20:18:27Z

Anyway, are there any plans to better support volume mounts in Container Ops, maybe in some declarative way?

+1!
I have not worked extensively with Kubeflow Pipelines specifically, but it definitely makes sense for the Pipelines to be able to use K8s-native storage concepts, e.g., PVCs.

For context, here is the work we have contributed to in Kubeflow, initially for notebooks:
kubeflow/kubeflow#34
and the PR:
kubeflow/kubeflow#1918

@IronPan We -- Arrikto -- would be more than willing to contribute effort in getting Kubeflow Pipelines to use the native K8s storage resources seamlessly.

And in general, we would definitely like to contribute more in getting Pipelines to be able to expose the characteristics of the underlying native K8s resources [e.g., pod spec] easily.

yebrahim · 2018-12-07T19:28:59Z

/assign ark-kun

IronPan · 2018-12-07T21:19:32Z

Anyway, are there any plans to better support volume mounts in Container Ops, maybe in some declarative way?

This make sense, especially for one who are familiar with K8s yaml paradigm. @Ark-kun I think this aligns with your idea. Any thoughts?

Pipelines to be able to use K8s-native storage concepts, e.g., PVCs.

The container op supports mounting volumes so it should support PVC already.

be able to expose the characteristics of the underlying native K8s resources [e.g., pod spec] easily.

Pipeline uses Argo as underlying orchestrator, and not every pod spec make sense for the orchestrator so I think we should be a bit cautious what pod API fields to expose. I would like to treat this case by case. Do you have specific features want to be supported?

Ark-kun · 2018-12-07T21:21:31Z

@StefanoFioravanzo

better support volume mounts in Container Ops

@vkoukis

use the native K8s storage resources seamlessly

Can you elaborate more? We're already using the full native K8s types to specify the volumes and volumeMounts. K8s has pretty extensive volume API. Do you want us to make some subset of the K8s APIs to be easier to specify instead of writing k8s-style spec?

In any case, we're working on both expanding our support for k8s APIs and adding ways to simplify the pipeline author's job. See gcp.set_tpu_resource or gcp.use_gcp_secret for instance.

Ark-kun · 2018-12-07T21:44:04Z

expose the characteristics of the underlying native K8s resources [e.g., pod spec] easily

We're working on some improvements here, but we're currently limited by the parts of K8s API that Argo supports. While Argo supports the full k8s Container spec, it does not fully support the Pod spec.
Argo currently has some support for:

ActiveDeadlineSeconds
Affinity
Metadata (Already implemented)
NodeSelector
Parallelism (Argo only, not Pod spec)
RetryStrategy
Tolerations
Volumes (global: on Workflow level; Already implemented)

Which of those do you need to your pipelines?

What would you like API to look?
Do you find it confusing to have both Container properties and Pod properties in the same place?
Would you like Pod properties have a pod_ prefix to easily identify them (e.g. train_op(...).set_pod_retry_strategy(...) )?

StefanoFioravanzo · 2018-12-09T20:55:13Z

@Ark-kun You are right, the support for k8s native types to specify Volumes and VolumesMounts is there. My comment pointed to some easier way to specify the volume mounts and the gcp.use_gcp_secret example strikes the point.

@IronPan Now that I managed to run a custom pipeline using local volume mounts, I would like to read also form a gcp bucket (I am still running on Minikube). As far as I understood, Kubeflow creates its own service account when deployed on a gcp cluster. Missing that, I would need to create my own service account and then create a k8s secret.

But then how should I mount the secret in the container?

jlewi · 2018-12-13T01:49:56Z

@Ark-kun Why not let pipelines just orchestrate K8s objects so users have full access to the K8s APIs rather than trying to figure out which fields to expose?

Ark-kun · 2018-12-18T01:38:44Z

@Ark-kun Why not let pipelines just orchestrate K8s objects so users have full access to the K8s APIs rather than trying to figure out which fields to expose?

Argo does not allow full access to K8s APIs. For instance, Argo has support for Container spec, but not the Pod or Job spec.

Should we expose Argo-specific functionality which differs from K8s model?
Should we expose K8s functionality that cannot be executed in Argo?

Ark-kun · 2018-12-18T01:44:59Z

Also some features that make sense for raw k8s network services are not used when running finite tasks while preventing the pipeline system from adding value to the user. Example of features that can be added by the pipeline system that K8s lacks: caching, reproducibility, data provenance, security.

E.g. If we allow any Pod to have privilege access, we cannot guarantee security or data consistency. Same if we allow Pods to freely modify the Pipelines system databases.

Nevertheless, we strive to have maximum parity with the K8s API as allowed by Argo.

zoux86 · 2019-01-20T14:39:26Z

thanks

…ow#477) Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

k8s-ci-robot assigned Ark-kun Dec 7, 2018

StefanoFioravanzo mentioned this issue Dec 9, 2018

Unable to visualize table in OutputViewer #489

Closed

zoux86 mentioned this issue Jan 20, 2019

An error occurs when run TFX example in local kubeflow cluster #703

Closed

Ark-kun closed this as completed Mar 15, 2019

vlagache mentioned this issue May 23, 2020

I can't mount a volume in my container locally #3835

Closed

snyk-bot mentioned this issue May 26, 2021

[Snyk] Security upgrade @kubernetes/client-node from 0.8.2 to 0.12.1 MaxKelsen/pipelines#19

Open

Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023

fix wait_for_deployment to also check for apps/v1 deployments (kubefl…

f1d5dcd

…ow#477) Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not mount volume in Container op #477

Can not mount volume in Container op #477

StefanoFioravanzo commented Dec 5, 2018

gaoning777 commented Dec 5, 2018

StefanoFioravanzo commented Dec 5, 2018

IronPan commented Dec 5, 2018 •

edited

Loading

StefanoFioravanzo commented Dec 5, 2018

vkoukis commented Dec 5, 2018

yebrahim commented Dec 7, 2018

IronPan commented Dec 7, 2018 •

edited

Loading

Ark-kun commented Dec 7, 2018

Ark-kun commented Dec 7, 2018

StefanoFioravanzo commented Dec 9, 2018

jlewi commented Dec 13, 2018

Ark-kun commented Dec 18, 2018

Ark-kun commented Dec 18, 2018

zoux86 commented Jan 20, 2019 •

edited

Loading

Can not mount volume in Container op #477

Can not mount volume in Container op #477

Comments

StefanoFioravanzo commented Dec 5, 2018

gaoning777 commented Dec 5, 2018

StefanoFioravanzo commented Dec 5, 2018

IronPan commented Dec 5, 2018 • edited Loading

StefanoFioravanzo commented Dec 5, 2018

vkoukis commented Dec 5, 2018

yebrahim commented Dec 7, 2018

IronPan commented Dec 7, 2018 • edited Loading

Ark-kun commented Dec 7, 2018

Ark-kun commented Dec 7, 2018

StefanoFioravanzo commented Dec 9, 2018

jlewi commented Dec 13, 2018

Ark-kun commented Dec 18, 2018

Ark-kun commented Dec 18, 2018

zoux86 commented Jan 20, 2019 • edited Loading

IronPan commented Dec 5, 2018 •

edited

Loading

IronPan commented Dec 7, 2018 •

edited

Loading

zoux86 commented Jan 20, 2019 •

edited

Loading