feat(v2): V2 create run api #6689

capri-xiyue · 2021-10-05T23:24:48Z

Description of your changes:
Modified run api to support v2

Checklist:

The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

capri-xiyue · 2021-10-06T17:42:56Z

/test kubeflow-pipeline-e2e-test

capri-xiyue · 2021-10-06T17:43:05Z

/test kubeflow-pipeline-upgrade-test

capri-xiyue · 2021-10-06T21:39:07Z

@Bobgy This is ready for early review. I need sdk change before I enable v2 IR spec support in backend

capri-xiyue · 2021-10-07T17:31:39Z

/test kubeflow-pipeline-e2e-test

capri-xiyue · 2021-10-07T17:31:48Z

/test kubeflow-pipeline-upgrade-test

Bobgy

Looks great, left mostly nit pickings comments!

backend/src/common/util/template_util.go

backend/src/apiserver/server/util.go

backend/src/apiserver/resource/resource_manager.go

Bobgy · 2021-10-14T07:16:23Z

backend/src/apiserver/resource/resource_manager.go

+	templateType := util.InferTemplateFormat(manifestBytes)
+
+	if templateType == util.Unknown {
+		return nil, util.NewInternalServerError(fmt.Errorf("failed to infer template type from manifest bytes"), "")


nit: why is there empty string here?

util.NewInternalServerError accepts error and string as arguments

backend/src/apiserver/resource/resource_manager.go

backend/src/apiserver/resource/model_converter.go

Tomcli · 2021-10-20T17:49:53Z

backend/src/common/util/template_util.go

+	if apiRun.GetPipelineSpec().GetRuntimeConfig().GetPipelineRoot() != "" {
+		job.RuntimeConfig.GcsOutputDirectory = apiRun.GetPipelineSpec().GetRuntimeConfig().GetPipelineRoot()
+	}
+	wf, err := compiler.Compile(job, nil)


Hi, we have a quick question. How can other pipeline runtime fit into here? Right now the compiler.compile function is converting to Argo. Will there be a common interface we can use to plugin other pipeline runtime like Tekton?

There isn't an existing interface for compiler.Compile, but considering the return value is different, do we need an interface?

JFYI, you can reuse the Visitor interface for implementation of tekton v2 compiler.

pipelines/v2/compiler/visitor.go

Line 40 in 3a2ef14

func Accept(job *pipelinespec.PipelineJob, v Visitor) error {

@Bobgy thanks for the info and yes, currently I implemented a compiler for Tekton, implementing the Visitor interface is what I have done. However, that's for the compiler's implementation. In order to integrate different compilers with pipeline run API here and pass the returned artifact to the underlying engine, I think a generic data struct is still needed. For example, a JSON data is returned representing Argo's workflow or Tekton's PipelineRun then passes to the pipeline engine. Or any other approaches to support different engines would be great too.

Good point, the part you mentioned is not designed yet. We want to start with still special casing for argo. Any concerns?

apology for the late response. no concerns as long as we agree to have an interface in compiler and pipeline run to integrate different pipeline engines. should I create an issue to make sure we will address this later on?

Yes, please create an issue. I cannot make a promise though, it will depend on the efforts and cost.

backend/api/pipeline_spec.proto

Bobgy · 2021-10-21T02:14:15Z

backend/api/pipeline_spec.proto

 }
+
+// Value is the value of the field.
+message Value {


Should we use protobuf.Value instead?

V2 launcher has not support protobuf.Value instead. Will change it to protobuf.Value once v2 launcher supports it.

Bobgy · 2021-10-21T02:48:26Z

backend/src/apiserver/resource/resource_manager.go

@@ -578,7 +522,7 @@ func (r *ResourceManager) RetryRun(ctx context.Context, runId string) error {
 	}

 	if runDetail.WorkflowRuntimeManifest == "" {
-		return util.NewBadRequestError(errors.New("workflow cannot be retried"), "Workflow must be Failed/Error to retry")
+		return util.NewBadRequestError(errors.New("workflow cannot be retried"), "Workflow must be Failed/Error to retry or run is with v2 mode")


nit: I think we know when it's v2 mode. Only return this error message when in v2 mode makes it clearer to understand

Looks like this error message can also happen in v1 mode

I meant that the root cause is different for v1 and v2, so it will be easier for users to understand when they hit the error, if we return accurate error messages (instead of it can be A reason or B reason).

For v1, we can tell the workflow must be in some state.
For v2, we can tell the feature is basically not implemented.

Bobgy · 2021-10-21T02:55:14Z

backend/src/apiserver/resource/resource_manager.go

-	workflow.SetAnnotationsToAllTemplatesIfKeyNotExist(util.AnnotationKeyIstioSidecarInject, util.AnnotationValueIstioSidecarInjectDisabled)
-
-	swfGeneratedName, err := toSWFCRDResourceGeneratedName(apiJob.Name)
+	scheduledWorkflow, err := tmpl.ScheduledWorkflow(apiJob)


I was thinking about waiting for the #6207, so that scheduled workflow can be unaware of v1/v2 difference.
What do you think?

Why with #6207, scheduled workflow can be unaware of v1/v2 difference? I found in #6207, when creating a job, a workflow spec should still get created from manifest bytes

Good catch! I think that's not necessary, just commented on #6207 (comment).

Also note swf controller implementation -- when pipeline version ID or pipeline ID is available, we will never go to the workflow spec branch.

pipelines/backend/src/crd/controller/scheduledworkflow/controller.go

Lines 528 to 565 in f129bc7

if swf.Spec.PipelineVersionID != nil {

if _, err := c.runServiceClient.CreateRun(ctx,

// These are not created under the correct experiment atm

// That is the issue that make the integration tests fail.

// This should not be to hard to fix, but needs to be handled.

&api.CreateRunRequest{Run: &api.Run{

Name: workflowName,

ResourceReferences: append(refs, &api.ResourceReference{

Key: &api.ResourceKey{

Type: api.ResourceType_PIPELINE_VERSION,

Id: *swf.Spec.PipelineVersionID},

Relationship: api.Relationship_CREATOR}),

PipelineSpec: &api.PipelineSpec{

Parameters: swf.GetParameters()},

Owner: &api.Owner{

Name: swf.GetName(),

Id: string(swf.GetUID())}}}); err != nil {

return false, "", err

}

return true, workflowName, nil

}

if swf.Spec.PipelineID != nil {

if _, err := c.runServiceClient.CreateRun(ctx,

&api.CreateRunRequest{Run: &api.Run{

Name: workflowName,

PipelineSpec: &api.PipelineSpec{

Parameters: swf.GetParameters(),

PipelineId: *swf.Spec.PipelineID},

Owner: &api.Owner{

Name: swf.GetName(),

Id: string(swf.GetUID())},

ResourceReferences: refs,

}}); err != nil {

return false, "", err

}

return true, workflowName, nil

}

Bobgy · 2021-10-21T03:00:34Z

backend/src/common/util/template_util.go

+	GetTemplateType() TemplateType
+
+	//Get workflow
+	RunWorkflow(apiRun *api.Run, options RunWorkflowOptions) (*Workflow, error)


With these new methods, this file immediately gets too large. Do you think it's helpful, in this PR, moving the interface to package kubeflow/pipelines/backend/src/template and move implementations to kubeflow/pipelines/backend/src/template/argo and kubeflow/pipelines/backend/src/template/v2?
(Paths are just examples)

Bobgy

LGTM

only some nit pickings, still reading the full PR

backend/src/apiserver/server/util.go

backend/src/apiserver/template/template.go

Bobgy · 2021-10-27T07:59:53Z

backend/src/apiserver/template/template.go

+	ParametersJSON() (string, error)
+	// Get bytes content.
+	Bytes() []byte
+	GetTemplateType() TemplateType


nit: I would probably prefer using language features to achieve this,

tmpl := template.New(bytes) argo, ok := tmpl.(*template.Argo) if ok { // tmpl is Argo template } v2, ok := tmpl.(*template.V2) if ok { // tmpl is V2 template }

refer to https://stackoverflow.com/a/50940347

Is there any advantage of using language feature here? I think using interface method is more clear

Because adding this method in the interface implies users of the interface need to know which types are there in the first place -- it's leaking interface implementation details to users...

If someone wants to introduce a new type of template, they cannot just implement this interface. They also need to check all the usages of the type and make sure they are updated as well.

But I don't think it's urgent to resolve this issue now, let's get this in.

capri-xiyue · 2021-10-28T21:40:48Z

I found refactoring create run UT takes long time. Will add a to do item to refactor it later.
I tried to use table driven test as much as I can now.

Bobgy · 2021-10-29T04:04:08Z

Because there were many file moves, let's get this in quickly and continue to improve

Bobgy · 2021-10-29T04:04:22Z

/lgtm
/approve

google-oss-robot · 2021-10-29T04:04:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Bobgy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* added draft of create v2 pipeline run * fixed broken UT and added UT for parsing template * modified run apis to support v2 IR spec * remove temporary patch * fixed dependency * fixed build failure * finished draft * finished create job and run * refactor template and fixed broken UT * updated go license * fixed build failure * fixed build * added UT * modified UT * fixed build failure * fixed license

capri-xiyue added 2 commits October 5, 2021 15:27

added draft of create v2 pipeline run

03583c6

fixed broken UT and added UT for parsing template

e3d38eb

capri-xiyue requested a review from Bobgy October 5, 2021 23:24

google-oss-robot added the do-not-merge/work-in-progress label Oct 5, 2021

google-cla bot added the cla: yes label Oct 5, 2021

google-oss-robot added the size/L label Oct 5, 2021

google-oss-robot requested review from Ark-kun, IronPan and neuromage October 5, 2021 23:25

modified run apis to support v2 IR spec

8fb6cff

google-oss-robot added size/XL and removed size/L labels Oct 6, 2021

capri-xiyue added 3 commits October 7, 2021 11:01

remove temporary patch

322f540

fixed dependency

cd802a7

fixed build failure

e53140c

Bobgy reviewed Oct 14, 2021

View reviewed changes

Bobgy mentioned this pull request Oct 14, 2021

feat(backend): support uploading v2 pipeline spec. Fixes #6169 #6513

Merged

3 tasks

finished draft

d30e149

google-oss-robot added size/XXL and removed size/XL labels Oct 20, 2021

Tomcli reviewed Oct 20, 2021

View reviewed changes

capri-xiyue added 2 commits October 20, 2021 15:25

finished create job and run

6487705

resolve master conflicts

802956f

capri-xiyue mentioned this pull request Oct 20, 2021

chore(v2): remove replace in v2 go mod #6786

Merged

1 task

Bobgy reviewed Oct 21, 2021

View reviewed changes

refactor template and fixed broken UT

0db6d60

capri-xiyue mentioned this pull request Oct 27, 2021

feat(scheduledworkflow): add functionality to relay on pipeline id instead of copy workflow. Fixes #4752 #6207

Closed

1 task

updated go license

4dc3c42

Bobgy reviewed Oct 27, 2021

View reviewed changes

resolve conflicts

de09d1b

capri-xiyue changed the title ~~WIP feat(v2): V2 create run api~~ feat(v2): V2 create run api Oct 27, 2021

google-oss-robot removed the do-not-merge/work-in-progress label Oct 27, 2021

capri-xiyue added 2 commits October 27, 2021 14:05

fixed build failure

5ed912b

fixed build

504621e

capri-xiyue added 5 commits October 28, 2021 15:14

added UT

654d915

modified UT

c731c25

fixed merge conflict

ddc0c8d

fixed build failure

7c89e74

fixed license

d68b8ac

google-oss-robot assigned Bobgy Oct 29, 2021

google-oss-robot added the lgtm label Oct 29, 2021

google-oss-robot added the approved label Oct 29, 2021

google-oss-robot merged commit 2e94575 into kubeflow:master Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2): V2 create run api #6689

feat(v2): V2 create run api #6689

capri-xiyue commented Oct 5, 2021 •

edited by Linchin

Loading

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 7, 2021

capri-xiyue commented Oct 7, 2021

Bobgy left a comment

Bobgy Oct 14, 2021

capri-xiyue Oct 14, 2021

Tomcli Oct 20, 2021

Bobgy Oct 21, 2021

Bobgy Oct 21, 2021

yhwang Oct 22, 2021 •

edited

Loading

Bobgy Oct 25, 2021

yhwang Oct 26, 2021

Bobgy Oct 29, 2021

Bobgy Oct 21, 2021

capri-xiyue Oct 26, 2021

Bobgy Oct 21, 2021

capri-xiyue Oct 26, 2021

Bobgy Oct 27, 2021

capri-xiyue Oct 27, 2021

Bobgy Oct 21, 2021

capri-xiyue Oct 21, 2021

Bobgy Oct 25, 2021 •

edited

Loading

Bobgy Oct 21, 2021

capri-xiyue Oct 26, 2021

Bobgy left a comment

Bobgy Oct 27, 2021

capri-xiyue Oct 27, 2021 •

edited

Loading

Bobgy Oct 29, 2021

capri-xiyue commented Oct 28, 2021 •

edited

Loading

Bobgy commented Oct 29, 2021

Bobgy commented Oct 29, 2021

google-oss-robot commented Oct 29, 2021

	if swf.Spec.PipelineVersionID != nil {
	if _, err := c.runServiceClient.CreateRun(ctx,

	// These are not created under the correct experiment atm
	// That is the issue that make the integration tests fail.
	// This should not be to hard to fix, but needs to be handled.
	&api.CreateRunRequest{Run: &api.Run{
	Name: workflowName,
	ResourceReferences: append(refs, &api.ResourceReference{
	Key: &api.ResourceKey{
	Type: api.ResourceType_PIPELINE_VERSION,
	Id: *swf.Spec.PipelineVersionID},
	Relationship: api.Relationship_CREATOR}),
	PipelineSpec: &api.PipelineSpec{
	Parameters: swf.GetParameters()},
	Owner: &api.Owner{
	Name: swf.GetName(),
	Id: string(swf.GetUID())}}}); err != nil {
	return false, "", err
	}
	return true, workflowName, nil
	}
	if swf.Spec.PipelineID != nil {
	if _, err := c.runServiceClient.CreateRun(ctx,
	&api.CreateRunRequest{Run: &api.Run{
	Name: workflowName,
	PipelineSpec: &api.PipelineSpec{
	Parameters: swf.GetParameters(),
	PipelineId: *swf.Spec.PipelineID},
	Owner: &api.Owner{
	Name: swf.GetName(),
	Id: string(swf.GetUID())},
	ResourceReferences: refs,
	}}); err != nil {
	return false, "", err
	}
	return true, workflowName, nil
	}

feat(v2): V2 create run api #6689

feat(v2): V2 create run api #6689

Conversation

capri-xiyue commented Oct 5, 2021 • edited by Linchin Loading

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 6, 2021

capri-xiyue commented Oct 7, 2021

capri-xiyue commented Oct 7, 2021

Bobgy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhwang Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bobgy Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bobgy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

capri-xiyue Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

capri-xiyue commented Oct 28, 2021 • edited Loading

Bobgy commented Oct 29, 2021

Bobgy commented Oct 29, 2021

google-oss-robot commented Oct 29, 2021

capri-xiyue commented Oct 5, 2021 •

edited by Linchin

Loading

yhwang Oct 22, 2021 •

edited

Loading

Bobgy Oct 25, 2021 •

edited

Loading

capri-xiyue Oct 27, 2021 •

edited

Loading

capri-xiyue commented Oct 28, 2021 •

edited

Loading