Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Late binding aspects on Kubeflow Pipeline #1187

Closed
animeshsingh opened this issue Apr 18, 2019 · 8 comments
Closed

Late binding aspects on Kubeflow Pipeline #1187

animeshsingh opened this issue Apr 18, 2019 · 8 comments

Comments

@animeshsingh
Copy link
Contributor

animeshsingh commented Apr 18, 2019

So there are usecases emerging which require some late binding aspects

  1. being able to choose the backing container for a component step at runtime - so the notion of a generic component, but the image backing it is determined at runtime
  2. Parallelism: In real world use-cases, for a pipeline going from A->B->C, one would like to run many parallel versions of B, but being determined at runtime how many.

Thoughts/comments?
@vicaire @Ark-kun @Ark-kun

@paveldournov
Copy link
Contributor

@animeshsingh would you be able to provide more specific use cases for #1?

/cc @gaoning777 to comment on the loop operator for #2, running steps of a loop async can be powerful.

@vicaire
Copy link
Contributor

vicaire commented Apr 23, 2019

For 1, the easiest option would be to have a container that launches another container selected depending on parameters. It's not great as the first container needs to wait on the second one, but if the first container consumes very little resources while waiting, the implementation should be acceptable. Let me know what you think.

For 2, Argo supports using the output of one container to determine how many parallel branches must be created so it should be able to add support for this in the Python DSL. (See https://github.com/argoproj/argo/blob/master/examples/loops-param-argument.yaml). Letting @gaoning777 comment on whether the current loop supports already provides functionality similar to https://github.com/argoproj/argo/blob/master/examples/loops-param-argument.yaml, or if more work is needed.

@animeshsingh
Copy link
Contributor Author

Thanks @paveldournov and @vicaire. Use case for 1 would be that a lot of images are built at runtime, for e.g. if you are running a notebook within a step, its probably packaged in container at runtime. Now a container from a container - yes that can sort of get there, but would be great to bring some level of abstraction in pipelines.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 25, 2019

@animeshsingh The image attribute supports input parameter placeholders, so you can already use pipeline parameters or task outputs to set the image name. So, 1 is non-issue.

@animeshsingh
Copy link
Contributor Author

@Ark-kun how about in the case when you are using components.yaml, which is what we are adopting

@vicaire
Copy link
Contributor

vicaire commented Apr 25, 2019

@animeshsingh, in the case of 1, the container that launches another container would not need to be rebuilt by the DSL. It would just need to be built once and for all, and be reused whenever it is needed. It would take as a parameter a reference to another container.

@gaoning777
Copy link
Contributor

The current recursion e.g. https://github.com/kubeflow/pipelines/blob/master/samples/basic/recursion.py supports conditional recursions. It can certainly wait for a condition containing runtime information to be met.
Since it does not directly support a runtime number of iterations, users need to write a component to increase the iteration index(as PipelineParam) and have the iteration index in the dsl.Condition. However, it is not hard to add the support in DSL for such use cases, using Argo withItems.

@gaoning777
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants