Extend the DSL with support for Persistent Volumes and Snapshots #801

vkoukis · 2019-02-09T01:00:37Z

Extend the DSL with support for Persistent Volumes and Snapshots

Overview - Rationale

This document describes proposed additions to the DSL of Kubeflow Pipelines to seamlessly support the use of Persistent Volumes and Volume Snapshots as distinct resources in pipelines.

This means steps can exchange multi-GB data doing standard file I/O on mounted Persistent Volumes, without having to upload/download this data to/from external object stores. To manipulate storage volumes, we use standard, vendor-neutral Kubernetes primitives [namely PersistentVolumeClaim, and VolumeSnapshot API objects], but without the user having to manipulate these K8s objects manually.

Instead, the goal is for the user to declare how steps use volumes as pipeline resources for data exchange, with an intuitive DSL syntax, similarly to what they do for the rest of their pipeline resources.

Extending the DSL to support the use of PVs for data exchange will make the use of Kubeflow Pipelines on-prem much easier, and will also enable the use of advanced storage functionality offered by modern storage, i.e., snapshots, as a way to gain insight on how a multi-GB pipeline executes at every step. A few issues related to this, which this design also targets, are #783, #721, #275.

This is work done jointly with @elikatsis, @ioandr, @klolos, and @iliastsi.
@elikatsis has already completed the changes to the compiler necessary to support the functionality being described in this design doc, and will be submitting a PR with the proposed changes incorporating community feedback from this discussion.

We have already completed related work, to introduce support for mounting arbitrary PVs in notebooks, for kubeflow/kubeflow#34, kubeflow/kubeflow#1918, being replaced by kubeflow/kubeflow#1995.

Looking forward to your comments!

Design

Design goals

In the following, we describe the design considerations behind our proposed approach, and discuss alternatives.

Being able to manipulate data in externally managed Persistent Volumes is a very useful feature for Kubeflow Pipelines, since it will allow multi-GB data exchange among steps, with minimal overhead (no need to upload/download data to/from external object stores), in a vendor-neutral way.
The goal is to extend the DSL in a way that integrates handling of external volumes seamlessly in a way that is similar to the current syntax for declaring file inputs and outputs.
Supporting volumes natively in the DSL makes data in a Persistent Volume just another piece of information being passed from step to step and can be used as input by the Pipelines compiler to extract dependency information between individual ContainerOps.
The goal is to make Persistent Volumes and Snapshots of Persistent Volumes top-level objects in the Pipelines DSL, to enable full programmatic manipulation, and to hide the low-level details of managing K8s API objects directly.
The proposed approach should support various usage patterns in a vendor-neutral way. Examples follow.

Working examples

Here are a few use cases that we used as representative examples when iterating on the proposed DSL extensions:

Single PV: A single Persistent Volume is mounted on each step of a pipeline, or on a subset of steps, and is accessible via a mount point inside each container. The PVC may either be created dynamically, as part of the Pipeline, or the user may choose to expose pre-existing data, by referring to a pre-existing K8s PersistentVolumeClaim (PVC) object that they have already created.
Multiple PVs, with dependencies between steps: Multiple Persistent Volumes are created dynamically, and are mounted as volumes in individual steps. The compiler uses these declarations to infer dependency information between steps.
Multiple PVs, with dynamic snapshots and clones: Multiple Persistent Volumes are created dynamically, one for each individual step, and a snapshot of each PV is created after each individual step. The workflow engine triggers snapshots via the standard K8s mechanism [submission of VolumeSnapshot resources] in a vendor-neutral way.

Basic Primitives

A Pipeline Volume: Instances of class dsl.PipelineVolume represent individual Persistent Volumes. They can be mounted by ContainerOp instances as volumes, at specific mount points, and used for data exchange among them. Users may ask Pipelines to create PVCs dynamically, or they can refer to data they had created in the past by mentioning PersistentVolumeClaims that already exist on the cluster, or by specifying an existing VolumeSnapshot K8s object as the data source for a new PVC to be created dynamically. Examples:
- Create a new empty PVC, size is 150G:
  vol_new = dsl.PipelineVolume(size="150G")
- Use an existing PVC, presumably already containing data:
- vol_existing = dsl.PipelineVolume(pvc="my-existing-data")
- Clone an existing snapshot:
  vol_from_snap = dsl.PipelineVolume(data_source="snapshot1")
Attributes:
- pvc: An existing PVC already filled with data, to use as a Persistent Volume for this pipeline.
- size: The size of a new PVC to be created dynamically, as part of the pipeline.C.
- storage_class: The storage class to use for the dynamically created PVC.
- data_source: The name of an existing VolumeSnapshot K8s object from which to clone data for a new dynamically created PVC, or a reference to a dsl.PipelineVolumeSnapshot instance.
A Pipeline Volume Snapshot: Instances of class dsl.PipelineVolumeSnapshot represent individual Volume Snapshots, created by snapshotting instances of dsl.PipelineVolume. They can be used to create new instances of dsl.PipelineVolume (a clone operation), or directly as input volumes for ContainerOp instances, at which point an implicit clone operation takes place.

Examples:

snap1 = dsl.PipelineVolumeSnapshot(vol1)
snap2 = vol2.snapshot()

Code - Iterations on proposed syntax

# Example 1 - Single PV: step1 // step2 // step3
#
# Note the PV is created dynamically, by the workflow engine,
# mounted at different mount points in each step,
# and that using the PV imposes no dependencies between steps.

vol_common = dsl.PipelineVolume(size="10G")

step1 = dsl.ContainerOp(
    name="step1",
    image="library/bash:4.4.23",
    command=["cat", "/mnt/file1"],
    volumes={"/mnt": vol_common}
)

step2 = dsl.ContainerOp(
    name="step2",
    image="library/bash:4.4.23",
    command=["cat", "/common/file2"],
    volumes={"/common": vol_common}
)

step3 = dsl.ContainerOp(
    name="step2",
    image="library/bash:4.4.23",
    command=["cat", "/mnt3/file3"],
    volumes={"/mnt3": vol_common}
)

# Example 2 - Multiple PVs, with dependencies between steps:
# step1 // step2 --> step3
# 
# step1 and step2 run in parallel, output to the same volume,
# and only when they are done with the volume, can step3 start
# to run.
# Note this example introduces dependencies based on volume use
# by itself, in addition to any dependencies on file_outputs.
#
# Note the notation `vol1.after(step1, step2)` returns an object that points
# to the same underlying Persistent Volume Claim as vol1, but with dependencies
# on step1 and step2, i.e., after step1 and step2 are done using it.

vol1 = dsl.PipelineVolume(...)

step1 = dsl.ContainerOp(
    name="step1",
    image="library/bash:4.4.23",
    command=["sh", "-c"],
    arguments=["echo 1 | tee /mnt/file1"],
    volumes={"/mnt": vol1}
)

step2 = dsl.ContainerOp(
    name="step2",
    image="library/bash:4.4.23",
    command=["sh", "-c"],
    arguments="echo 2|tee /mnt2/file2",
    volumes={"/mnt2": vol1}
)

step3 = dsl.ContainerOp(
    name="step3",
    image="library/bash:4.4.23",
    command=["sh", "-c"],
    arguments=["cat /mnt/file1 /mnt/file2"],
    volumes={"/mnt": vol1.after(step1, step2)}
)

# Example 3 - Single PV, used by consecutive steps:
# step1 --> step2 --> step3
# Note use of `stepX.volume` to refer to the single volume
# in stepX.volumes.
# Note how referring to stepX.volumes gets a
# reference to the same underlying volume but *after* stepX
# is complete. This is equivalent to the `vol1.after(step1)` syntax
# used in the previous example.

vol1 = dsl.PipelineVolume(...)

step1 = dsl.ContainerOp(
    name="step1",
    image="library/bash:4.4.23",
    command=["sh", "-c"],
    arguments=["echo 1|tee /data/file1"],
    volumes={"/data": vol1}
)

step1 = dsl.ContainerOp(
    name="step2",
    image="library/bash:4.4.23",
    command=["sh", "-c"],
    arguments=["cp /data/file1 /data/file2"],
    volumes={"/data": step1.volume}
)

step3 = dsl.ContainerOp(
    name="step3",
    image="library/bash:4.4.23",
    command=["cat", "/mnt/file1", "/mnt/file2"],
    volumes={"/data": step2.volume}
)

# Example 4 - Multiple PVs, with dynamic snapshots
#             and clones:
# step1 --> step2 --> step3 
#
# Note creation of dsl.PipelineVolumeSnapshot`objects from
# dsl.PipelineVolume objects.
# Note dynamic creation of dsl.PipelineVolume objects from
# snapshots, when we pass instances of dsl.PipelineVolumeSnapshot to
# the `volumes` named argument of ContainerOp instances.

vol1 = dsl.PipelineVolume(...)

step1 = dsl.ContainerOp(
    name="ingest data as a gzipped file",
    image="google/cloud-sdk:216.0.0",
    command=["sh", "-c"],
    arguments=["gsutil cat %s | tee /data/file1.gz" % url],
    volumes={"/data": vol1}
)
step1_snap = dsl.PipelineVolumeSnapshot(step1.volume)
# Or: step1_snap = step1.volume.snapshot()
# Or: step1_snap = dsl.PipelineVolumeSnapshot(vol1.after(step1))
# Or: step1_snap = vol1.after(step1).snapshot()

step2 = dsl.ContainerOp(
    name="gunzip",
    image="library/bash:4.4.23",
    command=["gunzip", "/data/file1.gz"],
    volumes={"/data": step1_snap}
)
step2_snap = dsl.PipelineVolumeSnapshot(step2.volume)

step3 = dsl.ContainerOp(
    name="output data to stdout",
    image="library/bash:4.4.23",
    command=["cat", "/data/file1"],
    volumes={"/data": step2_snap}
)
step3_snap = dsl.PipelineVolumeSnapshot(step3.volume)

Looking forward to your feedback, and we will be following up with a PR shortly.

The text was updated successfully, but these errors were encountered:

vkoukis · 2019-02-09T18:03:33Z

Also fixed a few typos in the description and the code example for uniformity.

jlewi · 2019-02-10T22:23:03Z

@vkoukis Did you consider extending the DSL to just make K8s resources first class? Then a resource like K8s Job or TFJob or any resource which then supports volumes and PVs would automatically work?

The DSL then becomes a way to build those objects using idiomatic python.

One way to achieve the above would be to use the DSL as it exists today and have the ContainerOp create the desired K8s resource. So we add a layer of indirection. Instead of directly creating a K8s job we launch a container which will create it.

This can be done today using pipelines support for lightweight containers.
https://github.com/kubeflow/pipelines/blob/master/samples/notebooks/Lightweight%20Python%20components%20-%20basics.ipynb

Did the DSL intentionally choose ContainerOp as the primitive or is this the reflection of the underlying implementation being Argo and the DSL being designed to match it?

/cc @Ark-kun @hongye-sun

vkoukis · 2019-02-12T02:18:02Z

Hello @jlewi , thanks for taking the time to read the design doc!

@vkoukis Did you consider extending the DSL to just make K8s resources first class? Then a resource like K8s Job or TFJob or any resource which then supports volumes and PVs would automatically work?

I think this design doc actually does target making these two specific K8s resources, PersistentVolumeClaim, and VolumeSnapshot objects first-class citizens, by having the DSL manage them, instead of the user managing them directly.

If I understand your argument correctly, it is "why not extend the DSL so it can also manage generic K8s resources, as pipeline steps?"

Yes, I did consider extending the DSL to make (generic) K8s resources first class, but I bumped into two main problems:

The DSL then becomes a way to build those objects using idiomatic python.

One way to achieve the above would be to use the DSL as it exists today and have the ContainerOp create the desired K8s resource. So we add a layer of indirection. Instead of directly creating a K8s job we launch a container which will create it.

This can be done today using pipelines support for lightweight containers.
https://github.com/kubeflow/pipelines/blob/master/samples/notebooks/Lightweight%20Python%20components%20-%20basics.ipynb

Did the DSL intentionally choose ContainerOp as the primitive or is this the reflection of the underlying implementation being Argo and the DSL being designed to match it?

/cc @Ark-kun @hongye-sun

First: Yes, the user could orchestrate the creation of generic K8s resources, including volumes as lightweight containers, e.g., using func_to_container_op(). But the problem with orchestrating K8s resources directly is that the user has to learn about K8s resources, and the syntax to manipulate them is rather unwieldy.

This is also the problem that #783 tries to address.
If the user needs to create K8s resources manually, they have to:

create a Python function that uses the K8s API to create the resources
convert it to a ContainerOp
so when this ContainerOp runs, it will access the K8s API and create the resource.

This exposes a lot of the lower-level interaction with K8s to the end user.
And then, the user also needs to mount the resources on the ContainerOp instances that will uses them. End-to-end, they would have to work with code like this, adapted from #783:

from kubernetes import client as k8s_client
pvc = k8s_client.V1PersistentVolumeClaimVolumeSource(claim_name='claim')
volume = k8s_client.V1Volume(name='workflow', persistent_volume_claim=pvc)
volume_mount = k8s_client.V1VolumeMount(mount_path='/mnt/workflow/',name='workflow')

step = dsl.ContainerOp(...)

step.add_volume(volume)
step.add_volume_mount(volume_mount)

The whole point of having a DSL is to make the most common tasks super-simple.
So instead of doing all of the above, having dsl.PipelineVolume as a DSL object, distinct from a K8s API Persistent Volume Claim, means that to create a new PVC and mount it, or to just mount an existing PVC, you need two lines:

vol1 = dsl.PipelineVolume(size="150G")

and then

volumes={"/mnt": vol1}

in the arguments to the ContainerOp constructor.
Also note how per-ContainerOp mount points are folded into the keys of the volumes dict argument to ContainerOp.__init__().

Second: A more important advantage of using DSL-specific objects, instead of K8s objects directly, is that these objects, dsl.PipelineVolume and dsl.PipelineVolumeSnapshot have dependencies on them. When a user references a volume when instantiating a ContainerOp, e.g,. in volumes={"/mnt": vol2}, they actually bring the dependencies of vol2 onto this specific ContainerOp.

For example

vol2 = vol1.after(step2)

means "referencing vol2 will mount the same PVC as vol1, but only after step2 is done using it". Similarly, things like

vol_snap = dsl.PipelineVolumeSnapshot(vol2)

lead to implicit scheduling of the snapshot only when vol2 is ready, i.e., after step2 has run.

This makes volumes and snapshots accessible from the DSL in a very idiomatic Pythonic way, and allows expressing rather complex dependencies on the use of volumes by steps.

vicaire · 2019-02-13T06:39:47Z

/cc @hongye-sun
/cc @IronPan

Hi @vkoukis,

I have a couple comments:

Exposing the whole spec of Argo in the DSL. Implementing a higher level Python API on top of that for convenience.

Currently, the DSL exposes a subset of Argo features.

We are thinking of changing this so that All Argo features are available (including the POD and container level specific features that Argo exposes). We are looking for a way to expose these features in the DSL without having to manually write the Python code to make them accessible. (Maybe we could do this by leveraging the Argo openAPI spec, but it seems a bit tricky: https://github.com/argoproj/argo/blob/master/api/openapi-spec/swagger.json).

What is your opinion on this? Would it affect your implementation? Higher level utilities could still be implemented on top of the lower level API to match what you propose here.

Creating/Releasing the K8 resource.

Could you please clarify how the volume related resources will be created and reliably released in your implementation? Will you need to modify the backend (it only supports Argo workflows today)? Will you create a container from a pipeline step to create the resource? Will you leverage some Argo feature? (https://github.com/argoproj/argo/blob/master/examples/volumes-pvc.yaml#L11)

The later is I think what you mean when you say "Note the PV is created dynamically, by the workflow engine".

vicaire · 2019-02-13T06:49:32Z

/cc @vicaire

StefanoFioravanzo · 2019-02-13T08:55:17Z

I would like to express my opinion on this from an end-user perspective. In our organization we have been testing Pipelines and tried to integrate it into our workflows. The issues and limitations with mounting volumes and managing PVCs restrict us from adopting Pipelines extensively and in production settings.

I agree with @vkoukis on this:

Yes, the user could orchestrate the creation of generic K8s resources, including volumes as lightweight containers, e.g., using func_to_container_op(). But the problem with orchestrating K8s resources directly is that the user has to learn about K8s resources, and the syntax to manipulate them is rather unwieldy.

One of the major obstacles in adopting Pipelines is that our users need both to learn the specific Pipelines DSL and to have some knowledge about the inners workings of Pipelines, to compile and run their workflows. We would like to be able to completely abstract away the K8S specific concepts (VolumeClaims, VolumeMount, PersistentVolumeClaims, Secrets, ...) to let the users just focus on building the actual workflow.

This proposal aligns perfectly with this and reflects our needs for a higher level DSL and easier volumes management. Specifically:

Single PV for multiple steps: having a simple way of declaring a single PV for multiple steps (either a dynamic PV or from an existing PVC) is a critical use case that we would use day to day for fast experimentation and iteration.
Defining dependencies between volumes: Being able to declare step dependencies using the step_x.volume property is very straightforward and easy to understand for someone that has little to no knowledge of how Pipelines and K8S works.
Snapshots: Having a snapshotting feature integrated into the DSL will help us with better handling of workflows history and reproducibility of steps. These are major concerns when building ML pipelines.

We feel like this proposal could set Pipelines' storage management in the right direction. We would be very happy to take part to the discussion, provide support and end-to-end use cases to design the best possible API and user experience.

swiftdiaries · 2019-02-13T14:57:38Z

Thank you for the detailed write-up and proposal ! These capture many pain-points that we faced as we were trying out Pipelines.
I superficially attempted to address this with #783.

I like the idea to manage the lifecycle of volumes and snapshots without touching the k8s python client.
I really like the dependency addition, it's very cool!

Creating/Releasing the K8 resource.
Could you please clarify how the volume related resources will be created and reliably released in your implementation?

Are you thinking about doing it via CRDs / APIs?
How do we surface events / errors related to resources being created / used back to user? Debugging issues around volumes seem to be hard without touching kubectl.

hongye-sun · 2019-02-13T18:37:42Z

Thanks for the proposal. I like the design to abstract k8s volume and snapshot implementation in DSL. I have a few questions:

How are you going to define the full interface of PipelineVolume? Does it map to the PersistentVolumeClaim spec (volumeClaimTemplates in argo)? Or it will also abstract full Volume spec? Do you image that it will support the full k8s spec or just persistent volume scenario? Should secret volume also align with PipelineVolume?
In many cases, a container will be mounted to multiple volumes: a PV and a secret volume for example. How can step.volume express which volume it points to?
How are you going to use Snapshot as history? Argo provides artifact store to snapshot the local file and directory in artifact store to keep the history of the outputs of a step. Maybe you can give us more context on why snapshot is required in your use case.
AFAIK, argo doesn't have native support for snapshot. What changes are required to implement the snapshot design?

vkoukis · 2019-02-14T02:05:54Z

@vicaire Hello Pascal, thanks for taking the time to read the design doc and provide feedback!

Creating/Releasing the K8 resource.
Could you please clarify how the volume related resources will be created and reliably released in your implementation? Will you need to modify the backend (it only supports Argo workflows today)? Will you create a container from a pipeline step to create the resource? Will you leverage some Argo feature? (https://github.com/argoproj/argo/blob/master/examples/volumes-pvc.yaml#L11)

We definitely don't want to modify the DSL compiler backend, this would be a major change. We explicitly targeted the existing Argo workflows framework when designing this.

I understand that Argo currently supports volumeClaimTemplates section in workflows, but this is not enough for managing volumes efficiently, especially when we want to expose dependencies. Not to mention there is no support for snapshots, as @hongye-sun also said.

We explicitly opted against volumeClaimTemplates, because we would have no control of when the volumes get created (Argo would create all of them at the start of workflow execution), we wouldn't have any dependencies, and most importantly, we wouldn't be able to create snapshots of volumes, and then new volumes out of these snapshots, in the middle of the workflow, as part of a DAG.

Instead, our current implementation bases dsl.PipelineVolume, dsl.PipelineVolumeSnapshot on the resource template that Argo provides, which is essentially a generic to create K8s resources as part of DAG. So instances of dsl.PipelineVolume and dsl.PipelineVolumeSnapshot know that they correspond to K8s objects, PersistentVolumeClaims, and VolumeSnapshots, underneath,
and the compiler emits resource tasks to create these resources. This has the advantage that these tasks can then depend on other steps, and can have conditions on which other steps depend.

Going this way enables this kind of code with rich dependencies:

vol1 = dsl.PipelineVolume(...)
step1 = dsl.ContainerOp(
    volumes={"/mnt": vol1}
)
step1_snap = dsl.PipelineVolumeSnapshot(step1.volume)

This will create one task in the DAG for creating the PVC that backs vol1 [if we actually have to create a new PVC, depending on the arguments to dsl.PipelineVolume's constructor], then the execution of step1 will depend on the completion of this task, so it can use it, then the creation of the VolumeSnapshot that backs step1_snap will be another taks that depends on the completion of step1 (because it refers to step1.volume, which is vol1.after(step1), which depends on step1), etc.

Note we do not depend on how Argo itself implements the resource template itself.

Exposing the whole spec of Argo in the DSL. Implementing a higher level Python API on top of that for convenience.

After we have this merged, and have gathered more experience, we would definitely like to tackle what you are proposing, and if I understand correctly also what @jlewi alluded to:
Can we have the underlying implementation of both these DSL objects [dsl.PipelineVolume, and dsl.PipelineVolumeSnapshot] essentially depend on a generic "schedule K8s resource, with pipeline dependencies" underlying object? This seems to have quite a few difficult points, but is definitely a good goal, and we'd be happy to target it.

The later is I think what you mean when you say "Note the PV is created dynamically, by the workflow engine".

Yes, we are leveraging a feature that Argo provides, the resource template.

vkoukis · 2019-02-14T02:08:21Z

@StefanoFioravanzo

We feel like this proposal could set Pipelines' storage management in the right direction. We would be very happy to take part to the discussion, provide support and end-to-end use cases to design the best possible API and user experience.

Hello Stefano!

Thank you for the encouraging words! We're definitely excited about the real-world use cases of this, as well.

We'll be following up with a PR shortly. We would love to get your feedback on the implementation, and adjust accordingly.

vkoukis · 2019-02-14T02:11:27Z

@swiftdiaries

Thank you for the detailed write-up and proposal ! These capture many pain-points that we faced as we were trying out Pipelines.
I superficially attempted to address this with #783.

Hello Adhita! We had read your #783 as part of trying to understand the current approaches, and we had it in mind as we iterated on the proposed extensions to the DSL, thank you for this.

I like the idea to manage the lifecycle of volumes and snapshots without touching the k8s python client.
I really like the dependency addition, it's very cool!

Thanks :)

Are you thinking about doing it via CRDs / APIs?

Dependency resolution comes with using resource templates. They are part of the DAG, so they can depend on previous steps, and other steps can depend on them. Essentially, creating volumes, snapshots from these volumes, then new volumes from these snapshots, is a full-fledged step, with dependencies.

How do we surface events / errors related to resources being created / used back to user? Debugging issues around volumes seem to be hard without touching kubectl.

Since we have creation of PVCs and VolumeSnapshots as distinct steps in a DAG, we can show them independently, and they can fail independently. If a resource task fails, whatever depends on it, will never run.

We haven't looked deeply into into how we can communicate this failure condition all the way up to the UI, so we would definitely welcome any suggestions you may have!

vkoukis · 2019-02-14T02:19:49Z

@hongye-sun

Thanks for the proposal. I like the design to abstract k8s volume and snapshot implementation in DSL. I have a few questions:

Thank you for taking the time to read the design doc and give feedback!

How are you going to define the full interface of PipelineVolume? Does it map to the PersistentVolumeClaim spec (volumeClaimTemplates in argo)? Or it will also abstract full Volume spec? Do you image that it will support the full k8s spec or just persistent volume scenario? Should secret volume also align with PipelineVolume?

We explicitly do not want it to be a generic "K8s Volume", because this may mean a lot of different things, which behave in a completely different way. We explicitly target dsl.PipelineVolume to correspond to a PersistentVolumeClaim object, external storage made available to K8s, managed in a vendor-agnostic way. Other types of resources, like Secrets, and ConfigMaps are also treated as "K8s Volumes", and mounted on containers, but they are used in a very different way: you do not use Secrets to exchange a few 100s of GB of data between steps, for example.

We limit what a dsl.PipelineVolume can be. It is a Persistent Volume Claim, so we can then make the common case as simple as possible in the DSL, and don't have to deal with all the different options. If the user really wants to mount a ConfigMap in their ContainerOp instance, they're free to create individual K8s API objects and .add_volume(), .add_volume_mount() them to ContainerOp instances to their heart's content :)

Also note that although dsl.PipelineVolume instances correspond to PVCs, the compiler doesn't emit volumeClaimTemplates in the Argo workflow for them, so we can also support VolumeSnapshot objects, with dependencies, treat them uniformly, and be more flexible. It emits resource templates instead.

In many cases, a container will be mounted to multiple volumes: a PV and a secret volume for example. How can step.volume express which volume it points to?

A ContainerOp instance can have multiple PVCs mounted on it, e.g.,

step1 = dsl.ContainerOp(
    ...
    volumes={"/mnt1": vol1, "/mnt2": vol2}
)

We treat step1.volumes the same way as the DSL treats step1.file_outputs currently: If one of step1.file_outputs / step1.volumes only contains a single element, then step1.output / step1.volume refers to this element. Otherwise, you have to refer to invidual items in step1.volumes.

How are you going to use Snapshot as history? Argo provides artifact store to snapshot the local file and directory in artifact store to keep the history of the outputs of a step. Maybe you can give us more context on why snapshot is required in your use case.

I think @StefanoFioravanzo's comment summarizes it nicely:

Snapshots: Having a snapshotting feature integrated into the DSL will help us with better handling of workflows history and reproducibility of steps. These are major concerns when building ML pipelines.

Taking a snapshot can be a very efficient operation with modern storage, even when multi-GB data is involved. So, we want to enable Pipelines to orchestrate the external storage to take snapshots, at user-defined steps. The user specifies this explicitly, as part of defining the pipeline, see the code example above. So, for example, they can snapshot every individual steps output, before it is passed to a next step, very efficiently.

This way, using snapshots allows you to reconstruct the exact input and output that a step had, by mounting these snapshots directly, without having to copy multiple GBs of data in and out of the container, from/to an object store, as you would have to do with the current artifact stores.

AFAIK, argo doesn't have native support for snapshot. What changes are required to implement the snapshot design?

A dsl.PipelineVolumeSnapshot corresponds to the standard VolumeSnapshot K8s object, and the DSL compiler emits tasks of type resource in the DAG, to create these objects explicitly.

jinchihe · 2019-02-14T03:32:40Z

@vkoukis Great design for supporting PipelineVolume easily!

One question here, did you consider the case that the PVC name is specified from Pipelines UI during creating a run after pipeline definited? Seems that's not supported by old way (using k8s_client.V1VolumeMount) due to #521. From the design we allow user to specify an existing PVC, but user may change that during creating the run on Pipelines UI. Thanks.

/cc @gyliu513 @hougangliu

vkoukis · 2019-02-14T20:03:17Z

@jinchihe

One question here, did you consider the case that the PVC name is specified from Pipelines UI during creating a run after pipeline definited? Seems that's not supported by old way (using k8s_client.V1VolumeMount) due to #521. From the design we allow user to specify an existing PVC, but user may change that during creating the run on Pipelines UI. Thanks.

Thanks for reading the design doc and giving feedback!

That's a good question. Similarly to how you can specify the data source to use as a gs:// URL in a dsl.PipelineParam, I think it makes sense to be able to specify the PVC to use as your data source, again based on a dsl.PipelineParam.

I'm not sure #521 is related; the discussion there is whether a ContainerOp instance accepts dsl.PipelineParams for command, or Image, which I think is orthogonal to this design doc.

So, to summarize, I think it makes sense to be able to specify an existing PVC as a dsl.PipelineParam, we'll make sure it works, and if it doesn't, we'll have a clear understanding of why, and propose a fix.

jinchihe · 2019-02-15T01:38:22Z

@jinchihe

One question here, did you consider the case that the PVC name is specified from Pipelines UI during creating a run after pipeline definited? Seems that's not supported by old way (using k8s_client.V1VolumeMount) due to #521. From the design we allow user to specify an existing PVC, but user may change that during creating the run on Pipelines UI. Thanks.

Thanks for reading the design doc and giving feedback!

That's a good question. Similarly to how you can specify the data source to use as a gs:// URL in a dsl.PipelineParam, I think it makes sense to be able to specify the PVC to use as your data source, again based on a dsl.PipelineParam.

I'm not sure #521 is related; the discussion there is whether a ContainerOp instance accepts dsl.PipelineParams for command, or Image, which I think is orthogonal to this design doc.

So, to summarize, I think it makes sense to be able to specify an existing PVC as a dsl.PipelineParam, we'll make sure it works, and if it doesn't, we'll have a clear understanding of why, and propose a fix.

Sounds great. Yes, based on a dsl.PipelineParam. Thanks @vkoukis

vicaire · 2019-02-20T09:04:18Z

Update: the KFP team is looking into this closely. We realize the importance of good support for volume, and a good/intuitive abstraction in the DSL.

vkoukis · 2019-02-20T16:05:26Z

@vicaire Great, thanks for the update! We've proposed a demo at the upcoming community meeting of Tue, Feb 26, so we can also show the latest iteration of our code live, and solicit feedback.

jlewi · 2019-02-27T18:37:35Z

@hongye-sun @vkoukis How would things change if the fundamental primitive in the DSL was a K8s job and not a container?

e.g. if it was

step2 = dsl.Job()

A container is not a top level K8s resource; Deployments, Jobs, Pods, Custom resources these are top level resources. Is our efforts to simplify K8s resources by eliminating certain fields just creating more problems? Are we just going to end up exposing all the fields in the underlying K8s resource just with a different API?

What if we broke this into separate problems

KFP supports orchestrating K8s objects
Providing an idiomatic python library for generating K8s resources
Finding or creating a suitable K8s resource for a python lambda

The second problem seems to tap into a larger problem within K8s community of using programming languages to define resources.

I've seen a couple efforts like pulumi to do it using JS. I can't think of anything for python other than the auto-generated libraries.

Regarding #3, it seems like one of the problems we are facing is that in python its easy to define a lambda as a python function and have well defined inputs and outputs. Right now we are trying to shoe horn that into what Argo provides. Maybe there's a better primitive out there e.g. from OpenFaas or Knative? And if not should we consider creating a suitable custom resource that the DSL could compile to?

hongye-sun · 2019-02-27T19:33:34Z

Jeremy,

We can make DSL just focusing on the orchestration by generalizing k8s resource spec like:

dsl.ResourceOp(k8s_resource_spec)

The problem I see here is how to pass the data between steps which is the core problem for orchestration part. The data I am describing here have two types: small parameter data and large artifact data.

For small ones, we solved it in ContainerOp by chaining output_files between steps. This logic here is very specific to container resource in k8s. I am not confident that we can come up with generic solution to all k8s resources. The argo's way to use json path query to return output from k8s resource is not enough as there is no way to let user to control the output in their code. E.g. user cannot output some data from a tf-job to pipeline easily.

The same problem applies to large data. I think the proposal here is trying to solve large data passing problem by using volumes.

I have a few ideas here:

we probably still need special types of OPs (like ContainerOp and probably TFJobOp) in dsl which understand underlying k8s resource and know how to pass data from it.
Other than the data passing interface, we may use an idiomatic python library to specify the rest of the spec of the resource (if there is a good one there).
We may provide two types of the input and outputs interfaces. One is for inline parameter passing and the other is for large artifact passing.

Does it make sense?

swiftdiaries · 2019-03-03T05:10:57Z

Hey Antons A wrapper around the k8s python client for using volumes already exists (see: PR #783) Adding a EmptyDir is still left though.

…

On Mar 2, 2019, at 17:49, Antons Kranga ***@***.***> wrote: Let me put my 5 cents. I believe existing functionality of a ContainerOp is good enough. There are number of variations what for to use volumes. So instead of trying to replicate PodSpec it is better (IMHO) to stick with python (client)[https://github.com/kubernetes-client/python] Current implementation of ContainerOp allows easy escape to the python client. Also (IMHO) KFP should be generic enough and avoid put code that user (possibly data scientist) doesn't want to see. Created a gist is a gist how to use it: https://gist.github.com/akranga/bd92b48da913582e82b91ca13a4733ab Please comment on it! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

vkoukis · 2019-03-28T00:21:22Z

Following the review in #926 [thanks @hongye-sun , @vicaire ! ], I am amending this design doc to include a step of implementation steps.

Overall our goals are:

Use a new ResourceOp, in a similar way to the existing ContainerOp, to create generic K8s resources as part of a pipeline.
Inherit from ResourceOp to create VolumeOp, VolumeSnapshotOp, so we can create Persistent Volume Claims and VolumeSnapshots in K8s as generic resources.
Use PipelineParam instances as part of the definition of these K8s resources, so we can mount arbitrary volumes and snapshots, as created by these Op instances, at first by using the tedious, manual process of .add_volume(), .add_volume_mount()
Simplify the way ContainerOp instances consume persistent volumes [include a volumes argument in their constructor]
Simplify the way VolumeOp instances produce V1Volume instances as a new .volume attribute

More specifically, here is the proposed implementation, in distinct steps:

Commit 1: Minor changes in the DSL and the Compiler

To introduce ResourceOp, we need to make minor modifications. The most significant of them are adding a dictionary cops to the pipeline, which will group all the ContainerOps, renaming _op_to_template() to _cop_to_template() and only pass cops to that function (because it is ContainerOp-specific processing).

Commit 2: ResourceOp

Define ResourceOp: a new DSL operator that wraps the resource template type of Argo.
It allows the creation of arbitrary K8s resources in arbitrary tasks as part of an Argo DAG.

Input: It accepts any object that describes a K8s resource in the K8s Python client.
This resource may refer to arbitrary PipelineParams as part of attribute values.
This means, we can create a new K8s object, with a name that comes from a PipelineParam.
Or, than we can create a new PersistentVolumeClaim object, with a name that comes from a PipelineParam. Or, that we can create a new TFJob object, with a number of workers that comes from a PipelineParam.

Output: The operator queries the object it created during the resource DAG task,
and may create outputs from arbitrary attributes of it, as defined via jsonPath queries.
All outputs are PipelineParams in their own right. Similarly to how a user can specify
file_outputs for a ContainerOp, whichreceive their content from specific files within the container
once its execution is complete, they can now specify attribute_outputs, which receive their content from specific attributes based on their location (jsonPath spec).
They function in exactly the same way as any other PipelineParam, and the .output shortcut for a single output is also valid. If no outputs are specified, .output will be the name of the resource created.

This would be an easy way to create tasks which create a certain kind of K8s resource and output some of its attributes as task outputs. It also covers @jlewi's proposal for making K8s resources first class.

If I understand correctly, it also closes #415, #429.

Similarly, it can be used to orchestrate TFJob instances or secrets, see #973, #1027

Example:

resource = k8s.V1Secret(
    api_version="v1",
    kind="Secret",
    metadata=k8s.V1ObjectMeta(generate_name="my-secret-"),
    type="Opaque",
    data={"username": param1, "password": param2}
)
rop = dsl.ResourceOp(
    name="create_secret",
    k8s_resource=resource,
    attribute_outputs={"name": ".metadata.name"}
)

rop.output is a PipelineParam and will be the name of the resource when it has been created.

Commit 3: Enable `ContainerOp` instances to consume arbitrary volumes created via `ResourceOp` instances.

In other words: Enable .add_volume(), .add_volume_mount() to work with arbitrary PipelineParam instances.
At this point, the pipeline does not know that the resource we created in ResourceOp is actually a PVC,
and can transfer data [hence imply dependencies] between DAG steps.

This is very flexible, but still leads to very verbose, ugly code, with lots of boilerplate.
Next step is to remove the need for this boilerplate. This simplifies pipeline code significantly, and has received very positive feedback.

Commit 4: Simplify the creation of `PersistentVolumeClaim` instances

dsl.VolumeOp() is a magic constructor with arguments that are much easier to use than creating k8s.V1PersistentVolumeClaim() resources by hand. Also adds {,status.capacity.storage} as size to its attribute_outputs.
For comparison:

BEFORE:

resource_spec_resources = k8s.ResourceRequirements(
    requests={"storage": "1Gi"}
)
resource_spec = k8s.V1PersistentVolumeClaimSpec(
    access_modes=["ReadWriteMany"],
    resources=resource_spec_resources
)
resource = k8s.V1PersistentVolumeClaim(
    api_version"v1",
    kind="PersistentVolumeClaim",
    metadata=k8s.V1ObjectMeta(name="my-new-pvc"),
    spec=resource_spec
)
rop = dsl.ResourceOp(
    name="create_pvc",
    k8s_resource=resource
)

AFTER:

vop = dsl.VolumeOp(
    name="create_pvc",
    resource_name="my-new-pvc",
    mode=["ReadWriteMany"],
    size="1Gi"
)

Commit 5: Simplify the consumption of PVC instances

Add a volumes attribute to ContainerOp instances, so they can consume volumes easily.

BEFORE:

vop = dsl.VolumeOp(
    name="create_pvc",
    resource_name="my-pvc",
    mode=["ReadWriteMany"],
    size="1Gi"
)
pvc_source = k8s.V1PersistentVolumeClaimVolumeSource(
    claim_name=vop.outputs["name"]
)
volume = k8s.V1Volume(
    name="volume",
    persistent_volume_claim=pvc_source
)
volume_mount = k8s.V1VolumeMount(
    name="volume",
    mount_path="/mnt"
)
step = dsl.ContainerOp(
    name="step_1",
    image="image",
    command=["sh", "-c"],
    arguments=["echo foo > /mnt/file1"]
)
step.add_volume(volume)
step.add_volume_mount(volume_mount)
step.after(vop)

AFTER:

vop = dsl.VolumeOp(
    name="create_pvc",
    resource_name="my-pvc",
    mode=["ReadWriteMany"],
    size="1Gi"
)
pvc_source = k8s.V1PersistentVolumeClaimVolumeSource(
    claim_name=vop.outputs["name"]
)
volume = k8s.V1Volume(
    name="volume",
    persistent_volume_claim=pvc_source
)
step = dsl.ContainerOp(
    name="step_1",
    image="image",
    command=["sh", "-c"],
    arguments=["echo foo > /mnt/file1"],
    volumes={"/mnt": volume}
)
step.after(vop)

Advantages:
ContainerOp instances can consume any kind of volume, anything that can be expressed as a k8s.V1Volume(), including secrets. This also covers the use cases mentioned by @hongye-sun in this comment.

This still needs lots of boilerplate however. We still have to create the k8s.V1Volume() objects manually, and also define the dependencies between the ResourceOp and ContainerOp DAG tasks manually.

Commit 6: Emit a `k8s.V1Volume()` as `.volume` from `dsl.VolumeOp`

A VolumeOp instance is a ResourceOp instance that creates PVCs. This is the most common case for using Volumes.
To make the common case simple, we extend VolumeOp so it outputs a .volume attribute, ready to be consumed by the volumes argument to ContainerOp's constructor.

This makes creating and consuming PVCs dramatically simpler, in the common case.
So the previously mentioned "AFTER" becomes

vop = dsl.VolumeOp(
    name="create_pvc",
    resource_name="my-pvc",
    mode=["ReadWriteMany"],
    size="1Gi"
)
step = dsl.ContainerOp(
    name="step_1",
    image="image",
    command=["sh", "-c"],
    arguments=["echo foo > /mnt/file1"],
    volumes={"/mnt": vop.volume}
)
step.after(vop)

This still needs boilerplate however, because we still have to define the dependencies between the ResourceOp and ContainerOp DAG tasks manually.

Commit 7: Introduce `dsl.PipelineVolume`. Derive dependencies automatically based on `volumes`

The common case , i.e., mounting a PVC as a volume into a ContainerOp instance via its volumes argument, should create a dependency from this ContainerOp instance, to the ResourceOp instance that creates it. Similarly, there is an implied dependency when a ContainerOp outputs data into a mounted volume, then terminates, and a new ContainerOp instance reuses this PVC and reads data off it.

There is currently no way to express this kind of data-driven dependencies in the DSL.
There is always the last resort of specifying all dependencies manually, e.g., step2.after(step1).
However, it is a stated goal of this design to deduce dependencies automatically, significantly simplifying the syntax. This is similar to what the compiler does now with PipelineParam instances: it can detect dependencies based on how we use PipelineParam instance. This PR aims to make volumes a first-class way of passing GBs of objects between steps, thus creating dependencies implicitly.

This is aligned with how the current implementation uses PipelineParam instances: Consuming a PipelineParam, e.g,. in the arguments to a step2 [args = "echo %s" % step1.output], leads to an implicit dependency from step2 to step1 without the user having to call step2.after(step1) explicitly.

This is also aligned with KFP's goal to bring support for Argo artifacts. Consuming artifacts will bring in implicit dependencies from the ContainerOp instances that produced said artifacts. So, consuming a PipelineParam directly (%s substitution in a string), consuming a volume (mounting the volume into a ContainerOp), or consuming an artifact (TBD) are equivalent ways of deriving dependencies.

dsl.PipelineVolume inherits from k8s.V1Volume(). Essentially, it is a k8s.V1Volume() that comes with its own set of KFP-specific dependencies. These can be processed in the ContainerOp's constructor. So, we can omit the redundant step.after(vop) line.

Commit 8: Simplify the creation of `VolumeSnapshot` instances. Introduce `dsl.VolumeSnapshotOp()`

[Note: This commit refers to VolumeSnapshots which are Alpha for Kubernetes. This is mentioned in the source code as well, as requested by @hongye-sun's review]

dsl.VolumeSnapshotOp() is a magic constructor with arguments that are much easier to use than creating VolumeSnapshot resources by hand. Also adds {.status.restoreSize} as size to its attribute_outputs.
Arguments can be:

A dsl.PipelineVolume. This object must know it comes from a PVC, and must have a distinct source. E.g.

snapop = dsl.VolumeSnapshotOp(
    resource_name="my-pvc-snapshot",
    pipeline_volume=step.volume  # This implies snapop.after(step)
)

A PVC name. E.g.

snapop = dsl.VolumeSnapshotOp(
    resource_name="my-pvc-snapshot",
    pvc=vop.outputs["name"]
)
snapop.after(step)

A dsl.VolumeSnapshotOp has the .snapshot attribute which is a k8s.V1TypedLocalObjectReference and makes creating a PVC from this data source easier. So the two ways of creating a PVC out of the VolumeSnapshot created by a dsl.VolumeSnapshotOp are:

First Way

vop2 = dsl.VolumeOp(
    name="create_pvc_from_snapshot",
    resource_name="my-clone",
    data_source=snapop.snapshot,
    size="2Gi"
)

Second Way

vop2 = dsl.VolumeOp(
    name="create_pvc_from_snapshot",
    resource_name="my-clone",
    data_source=snapop.outputs["name"],
    size="2Gi"
)

elikatsis · 2019-03-28T21:38:20Z

Hi everyone!

I am really glad #879 is merged, it is a great job. Kudos to @eterna2!

Since we will be implementing ResourceOp, which corresponds to the resource template of Argo, we will create a base class called BaseOp. That way, ContainerOp and ResourceOp will derive from that class. In the future there may be a ScriptOp as well.
What is good about it is the fact that we will not write any duplicate code (e.g. methods such as .after(), .add_pod_annotation(), etc).

We will move anything common for all the leaf template types from _container_op.py to _base_op.py:
The class Sidecar (and Container since it derives from it)
The util functions (deprecation_warning, as_list and create_and_append)
The BaseOp class with any non ContainerOp specific method or property.

Finally, BaseOp should not be exposed to the user.

* Rename BaseOp.volumes to k8s_volumes * Add cops attributes to Pipeline. This is a dict having all the ContainerOps of the pipeline. * Set some processing in _op_to_template as ContainerOp specific Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

vicaire · 2019-04-05T00:41:43Z

vkoukis@, elikatsis@, thanks for the fantastic proposal and detailed implementation plan. This is great! We are looking forward to the PRs!

vkoukis · 2019-04-05T10:21:35Z

@vicaire Thank you for the kind words! We have pushed a new version of the PR following the revised implementation plan, please see here: #926 (comment)

Looking forward to your review!

* Rename BaseOp.volumes to k8s_volumes * Add cops attributes to Pipeline. This is a dict having all the ContainerOps of the pipeline. * Set some processing in _op_to_template as ContainerOp specific Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* Add cops attributes to Pipeline. This is a dict having all the ContainerOps of the pipeline. * Set some processing in _op_to_template as ContainerOp specific Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* SDK: Create BaseOp class * BaseOp class is the base class for any Argo Template type * ContainerOp derives from BaseOp * Rename dependent_names to deps Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: In preparation for the new feature ResourceOps (#801) * Add cops attributes to Pipeline. This is a dict having all the ContainerOps of the pipeline. * Set some processing in _op_to_template as ContainerOp specific Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Simplify the consumption of Volumes by ContainerOps Add `pvolumes` argument and attribute to ContainerOp. It is a dict having mount paths as keys and V1Volumes as values. These are added to the pipeline and mounted by the container of the ContainerOp. Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Add ResourceOp * ResourceOp is the SDK's equivalent for Argo's resource template * Add rops attribute to Pipeline: Dictionary containing ResourceOps * Extend _op_to_template to produce the template for ResourceOps * Use processed_op instead of op everywhere in _op_to_template() * Add samples/resourceop/resourceop_basic.py * Add tests/dsl/resource_op_tests.py * Extend tests/compiler/compiler_tests.py Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Simplify the creation of PersistentVolumeClaim instances * Add VolumeOp: A specified ResourceOp for PVC creation * Add samples/resourceops/volumeop_basic.py * Add tests/dsl/volume_op_tests.py * Extend tests/compiler/compiler_tests.py Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Emit a V1Volume as `.volume` from dsl.VolumeOp * Extend VolumeOp so it outputs a `.volume` attribute ready to be consumed by the `pvolumes` argument to ContainerOp's constructor * Update samples/resourceop/volumeop_basic.py * Extend tests/dsl/volume_op_tests.py * Update tests/compiler/compiler_tests.py Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Add PipelineVolume * PipelineVolume inherits from V1Volume and it comes with its own set of KFP-specific dependencies. It is aligned with how PipelineParam instances are used. I.e. consuming a PipelineVolume leads to implicit dependencies without the user having to call the `.after()` method on a ContainerOp. * PipelineVolume comes with its own `.after()` method, which can be used to append extra dependencies to the instance. * Extend ContainerOp to handle PipelineVolume deps * Set `.volume` attribute of VolumeOp to be a PipelineVolume instead * Add samples/resourceops/volumeop_{parallel,dag,sequential}.py * Fix tests/dsl/volume_op_tests.py * Add tests/dsl/pipeline_volume_tests.py * Extend tests/compiler/compiler_tests.py Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * SDK: Simplify the creation of VolumeSnapshot instances * VolumeSnapshotOp: A specified ResourceOp for VolumeSnapshot creation * Add samples/resourceops/volume_snapshotop_{sequential,rokurl}.py * Add tests/dsl/volume_snapshotop_tests.py * Extend tests/compiler/compiler_tests.py NOTE: VolumeSnapshots is an Alpha feature at the time of this commit. Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * Extend UI for the ResourceOp and Volumes feature of the Compiler * Add VolumeMounts tab/entry (Run/Pipeline view) * Add Manifest tab/entry (Run/Pipeline view) * Add & Extend tests * Update tests snapshot files Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com> * Cleaning up the diff (before moving things back) * Renamed op.deps back to op.dependent_names * Moved Container, Sidecar and BaseOp classed back to _container_op.py This way the diff is much smaller and more understandable. We can always split or refactor the file later. Refactorings should not be mixed with genuine changes.

This was referenced Feb 13, 2019

Adds a modifier function to simplify addition of local volumes to containerop #783

Merged

Enhance pipeline TFX taxi sample to support on-prem cluster #749

Merged

vicaire assigned vicaire and hongye-sun Feb 15, 2019

vicaire added the priority/p0 label Feb 20, 2019

This was referenced Feb 26, 2019

Add extra permissions to ClusterRole 'pipeline-runner' kubeflow/kubeflow#2556

Merged

Support parameter substitution in the volumes attribute argoproj/argo-workflows#1238

Merged

elikatsis mentioned this issue Mar 6, 2019

Extend the DSL to implement the design of #801 #926

Merged

vicaire added area/sdk/dsl kind/feature labels Mar 26, 2019

eterna2 mentioned this issue Mar 30, 2019

[Feature] Supports parameterized S3Artifactory for Pipeline and ContainerOp in kfp package #1064

Merged

vkoukis mentioned this issue Apr 16, 2019

Best practices for large data (GCS) kubeflow/kubeflow#3011

Closed

k8s-ci-robot closed this as completed in #926 Apr 25, 2019

elikatsis mentioned this issue May 9, 2019

The volumes PVC name is unreasonable after mount a pvc for pipeline #1303

Closed

This was referenced May 16, 2019

ResourceOp to support pvolumes #1345

Closed

Add verb "create" for Secrets to ClusterRole "pipeline-runner" kubeflow/kubeflow#3294

Closed

hongye-sun mentioned this issue May 29, 2019

Make low level APIs for DSL ContainerOp by leveraging k8s and argo library #415

Closed

elikatsis mentioned this issue Jun 4, 2019

Kubeflow Pipelines Volume examples kubeflow/website#757

Closed

mak-454 mentioned this issue Oct 2, 2019

TFJob should work well with pipelines #677

Closed

yanniszark mentioned this issue Jan 27, 2020

label-owners: Add Arrikto folk to relevant labels kubeflow/community#311

Merged

Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023

test(gcp): fix cleanup_blueprints.py one more attempt (kubeflow#801)

e2ec489

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend the DSL with support for Persistent Volumes and Snapshots #801

Extend the DSL with support for Persistent Volumes and Snapshots #801

vkoukis commented Feb 9, 2019 •

edited

Loading

vkoukis commented Feb 9, 2019

jlewi commented Feb 10, 2019

vkoukis commented Feb 12, 2019 •

edited

Loading

vicaire commented Feb 13, 2019

vicaire commented Feb 13, 2019

StefanoFioravanzo commented Feb 13, 2019

swiftdiaries commented Feb 13, 2019

hongye-sun commented Feb 13, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019 •

edited

Loading

jinchihe commented Feb 14, 2019

vkoukis commented Feb 14, 2019

jinchihe commented Feb 15, 2019

vicaire commented Feb 20, 2019

vkoukis commented Feb 20, 2019

jlewi commented Feb 27, 2019

hongye-sun commented Feb 27, 2019

swiftdiaries commented Mar 3, 2019 via email

vkoukis commented Mar 28, 2019 •

edited

Loading

elikatsis commented Mar 28, 2019

vicaire commented Apr 5, 2019

vkoukis commented Apr 5, 2019

Extend the DSL with support for Persistent Volumes and Snapshots #801

Extend the DSL with support for Persistent Volumes and Snapshots #801

Comments

vkoukis commented Feb 9, 2019 • edited Loading

Extend the DSL with support for Persistent Volumes and Snapshots

Overview - Rationale

Design

Design goals

Working examples

Basic Primitives

Code - Iterations on proposed syntax

vkoukis commented Feb 9, 2019

jlewi commented Feb 10, 2019

vkoukis commented Feb 12, 2019 • edited Loading

vicaire commented Feb 13, 2019

vicaire commented Feb 13, 2019

StefanoFioravanzo commented Feb 13, 2019

swiftdiaries commented Feb 13, 2019

hongye-sun commented Feb 13, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019

vkoukis commented Feb 14, 2019 • edited Loading

jinchihe commented Feb 14, 2019

vkoukis commented Feb 14, 2019

jinchihe commented Feb 15, 2019

vicaire commented Feb 20, 2019

vkoukis commented Feb 20, 2019

jlewi commented Feb 27, 2019

hongye-sun commented Feb 27, 2019

swiftdiaries commented Mar 3, 2019 via email

vkoukis commented Mar 28, 2019 • edited Loading

Commit 1: Minor changes in the DSL and the Compiler

Commit 2: ResourceOp

Example:

Commit 3: Enable ContainerOp instances to consume arbitrary volumes created via ResourceOp instances.

Commit 4: Simplify the creation of PersistentVolumeClaim instances

Commit 5: Simplify the consumption of PVC instances

Commit 6: Emit a k8s.V1Volume() as .volume from dsl.VolumeOp

Commit 7: Introduce dsl.PipelineVolume. Derive dependencies automatically based on volumes

Commit 8: Simplify the creation of VolumeSnapshot instances. Introduce dsl.VolumeSnapshotOp()

elikatsis commented Mar 28, 2019

vicaire commented Apr 5, 2019

vkoukis commented Apr 5, 2019

vkoukis commented Feb 9, 2019 •

edited

Loading

vkoukis commented Feb 12, 2019 •

edited

Loading

vkoukis commented Feb 14, 2019 •

edited

Loading

vkoukis commented Mar 28, 2019 •

edited

Loading

Commit 3: Enable `ContainerOp` instances to consume arbitrary volumes created via `ResourceOp` instances.

Commit 4: Simplify the creation of `PersistentVolumeClaim` instances

Commit 6: Emit a `k8s.V1Volume()` as `.volume` from `dsl.VolumeOp`

Commit 7: Introduce `dsl.PipelineVolume`. Derive dependencies automatically based on `volumes`

Commit 8: Simplify the creation of `VolumeSnapshot` instances. Introduce `dsl.VolumeSnapshotOp()`