Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceOp to support pvolumes #1345

Closed
ryandawsonuk opened this issue May 16, 2019 · 8 comments
Closed

ResourceOp to support pvolumes #1345

ryandawsonuk opened this issue May 16, 2019 · 8 comments
Assignees

Comments

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented May 16, 2019

ResourceOp currently doesn't have a way to depend upon a volume. If I try to add pvolumes I get got an unexpected keyword argument 'pvolumes'

The ResourceOp supports creating a resource or even deploying a Custom Resource such as a TFJob (just using TFJob for illustration here - it's really other kinds of training jobs I'd like to use this for). But what I don't see a way to do right now is make sure the custom resource depends on a volume created or modified in a previous step.

I don't think the modification needed here would have to do anything to mount a volume. The resource definition itself could do that. It would just need to force the step to depend on the one that created or modified the volume. Another approach might be support the dependency through inputs but that isn't presently available either.

@elikatsis , @vkoukis - any thoughts on this?

@elikatsis
Copy link
Member

Hello Ryan,
actually I was just reading #1344 and was composing an answer (containing questions!).

I will first answer to the main question of this issue.
A ResourceOp is a step of the pipeline which manipulates a Kubernetes resource. It executes nothing but a kubectl <action> ....
Therefore, it shouldn't (and can't) mount volumes. It is totally agnostic about the resource.

I believe what you mean is to create a resource, and make that resource mount some volumes.

I don't have much experience with TFJobs and launching them. Reading it diagonally, though, the deployment of a TFJob you mention in #1344 seems very strict.

I guess I'll need more context, but what would you say if there was a TFJobOp, subclass of ResourceOp, which would initiate a TFJob. This would expose any useful mechanism to customize the TFJob's manifest.

These thoughts have also been expressed by @jlewi and @hongye-sun
#801 (comment)
#801 (comment)

A TFJobOp could then have a pvolumes argument and parse the values passed to append its dependencies, as well as use them in the manifest respectively.

This issue could actually be the design document of this subclass, and your feedback using it can help us a lot.

@elikatsis
Copy link
Member

/assign @elikatsis

@hongye-sun
Copy link
Contributor

ResourceOp is different from ContainerOp. It starts with a "launcher" pod which calls k8s API to create another custom resource like TFJob. When you add a pvolumes to ResourceOp, usually you mean adding a volume to the custom resource but not the "launcher". The PipelineVolume derives from k8s volume, so you can directly use it in your code when building k8s custom resource.

For example, you can build a pod spec for a TFJob:

pod_spec = V1PodSpec(..., 
  volumes=[vop.volume], 
  containers=[V1Container(..., 
    volume_mounts=[V1VolumeMount(name=vop.volume.name, mount_path='...')]
)
tf_job = TFJob(worker=V1PodTemplate(spec=pod_spec))
tf_job_op = ResourceOp(k8s_resource=tf_job, ...)

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented May 16, 2019

I probably shouldn't have used TFJob as my example in this issue. I do want to be able to mount volumes to a TFJob but I intended to cover that with #1344. A TFJobOp seems to me a good way to address that concern.

What I had in mind when raising particular issue was other types of resource, though the only concrete example I have immediately is a vanilla kubernetes Job. I am not sure whether there is a way to create Jobs other than ResourceOp (CreateJobOp looks related but I think that might be CMLE-specific).

I should also clarify that I am able to add a volume to the resource that is created by the ResourceOp. But that alone doesn't give me a way to say that the resource step should come after some other volume-related step.

@elikatsis
Copy link
Member

elikatsis commented May 16, 2019

[...] I think that might be CMLE-specific

Yes, that's my opinion as well. A JobOp would not serve much of purpose since a Job could be anything.

What you could do is to use the after() method on the ResourceOp and explicitly specify these dependencies. (Note that if PipelineParams exist in the k8s_object passed, the corresponding dependencies will be produced, but I get such parameters don't always exist).

ResourceOp is a subclass of BaseOp (I should move it to a separate module, but I've been postponing it).
While BaseOp is of no use to the user, it does have all the common attributes and methods ContainerOp and ResourceOp have (both derive from it).
So you may try

step1 = dsl.ContainerOp(...)
step2 = dsl.ContainerOp(...)
job = dsl.ResourceOp(...).after(step1).after(step2)

(TODO: I will modify BaseOp's after() to handle multiple arguments and be able to do op.after(step1, step2) just as PipelineVolume's after() works)

@ryandawsonuk
Copy link
Contributor Author

Thanks @elikatsis ! Adding .after does indeed do what I need for this case.

I agree about JobOp not being necessary. If I really need to create a plain Job rather than use a ContainerOp (and I'm not sure I do) then I can use ResourceOp and if the Job has dependencies I can use .after (thanks again!).

I share the sense that a TFJobOp would be useful, if for no other reason then because it would be more intuitive than creating TFJobs through ContainerOps or ResourceOps.

@elikatsis
Copy link
Member

That's nice, you are welcome!
Shall we close this issue and discuss TFJobOp in #1344?

@ryandawsonuk
Copy link
Contributor Author

Yep, cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants