Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a container flow inside one pod #1313

Closed
bassel-z opened this issue May 12, 2019 · 2 comments
Closed

Support a container flow inside one pod #1313

bassel-z opened this issue May 12, 2019 · 2 comments

Comments

@bassel-z
Copy link

bassel-z commented May 12, 2019

I have noticed that each ContainerOp creates a pod not a container. It would be great if we can define a flow based on a series of steps that are launched as containers within the same pod. Would that be possible?

The idea behind it is that containers share the same disk storage volume. So for big data use cases and if multiple steps in the flow require the same data, it would make sense to load the data once to the node rather than loading the data for each step.

Is there a way to overcome the multiple loading issue currently across different steps (pods)?

@elikatsis
Copy link
Member

elikatsis commented May 12, 2019

Hello @bassel-z,
As far as I know, such pipelining inside a pod is not supported by Kubernetes. There are initContainers, which run before the main container but are not supported by the DSL, yet. Also, there are sidecars which are supported by the DSL, but they run in parallel with the main container.

However, what you could use is persistent storage which is local to your cluster. You can mount PVCs under some path, download data, write the results there, read from a subpath etc.
There are a couple of ways to do this. An easy one, if you are using KFP SDK v0.1.18 or later, is to use the pvolumes argument in the ContainerOp's constructor: check this sample for example, or these lines as a mini documentation.

Note: VolumeOp is an op (i.e. a task of the pipeline) which makes the creation of PVCs easy, however it is not yet supported since it requires Argo v2.3 which is soon to be released.

@Ark-kun
Copy link
Contributor

Ark-kun commented May 13, 2019

Generally this goes against the parallelism aspect of the pipelines.

The idea behind it is that containers share the same disk storage volume.

This is already possible to do multiple ways.

  • ContainerOp.add_volume + ContainerOp.add_volume_mount
  • kfp.onprem.mount_pvc
  • Arrikto's VolumeOp

Is there a way to overcome the multiple loading issue currently across different steps (pods)?

We're leaning to standardizing on system-managed volume-based approach in future.

@Ark-kun Ark-kun closed this as completed May 13, 2019
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this issue Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants