Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): workflow validation. Fixes #3526. #3965

Merged
merged 16 commits into from
Jun 14, 2021

Conversation

NikeNano
Copy link
Member

Currently pipelines will not be validated until run, this result in that it takes longer time to test and upload pipelines. This PR(current draft) aims to fix this and resolved #3526.

The goal is to validate pipelines when uploaded and then return the errors.

@kubeflow-bot
Copy link

This change is Reviewable

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 12, 2020

Thanks for this work!

BTW, the SDK performs the validation if working argo is in path. But it's still better for backend to validate.

Th only issue I see is that we need to be able to update the backend Argo module when updating the manifests.

@NikeNano
Copy link
Member Author

NikeNano commented Jun 12, 2020

BTW, the SDK performs the validation if working argo is in path. But it's still better for backend to validate.

Ohh is it this work you relate to @Ark-kun ?

@NikeNano NikeNano marked this pull request as ready for review June 14, 2020 12:16
@NikeNano NikeNano marked this pull request as draft June 15, 2020 06:50
@NikeNano NikeNano marked this pull request as ready for review June 15, 2020 11:03
@NikeNano
Copy link
Member Author

I am rather new to Golang (learning through opensource) and dont now how to fix this issue with currently breaks the tests.

Known dependencies are:
	github.com/kubeflow/pipelines/backend/src/crd/pkg/apis/scheduledworkflow/v1beta1
	github.com/argoproj/argo/pkg/apis/workflow/v1alpha1
	github.com/ghodss/yaml
	github.com/stretchr/testify/assert
	k8s.io/apimachinery/pkg/api/errors
	k8s.io/apimachinery/pkg/apis/meta/v1
	k8s.io/apimachinery/pkg/runtime/schema
	k8s.io/apimachinery/pkg/types
	k8s.io/kubernetes/pkg/apis/core
	google.golang.org/grpc/codes
	github.com/kubeflow/pipelines/backend/api/go_client
	github.com/kubeflow/pipelines/backend/src/crd/pkg/apis/scheduledworkflow
	github.com/cenkalti/backoff
	github.com/go-openapi/runtime
	github.com/go-openapi/strfmt
	github.com/golang/glog
	github.com/google/uuid
	github.com/pkg/errors
	github.com/golang/protobuf/ptypes/timestamp
	k8s.io/apimachinery/pkg/util/json
	k8s.io/client-go/kubernetes
	k8s.io/client-go/rest
	k8s.io/client-go/tools/clientcmd
	google.golang.org/grpc
	google.golang.org/grpc/status
Check that imports in Go sources match importpath attributes in deps.

I have added the dependency to the go.mod but it dont seem to help ... :( Is there anyone that could give some advice on how to solve this/where I can read up to solve it. @IronPan, @mgogogo, @Ark-kun Thanks for the help!

@NikeNano
Copy link
Member Author

friendly ping @rmgogogo

@Bobgy
Copy link
Contributor

Bobgy commented Jun 29, 2020

You may ask @jingzhang36 or @IronPan.

@rmgogogo
Copy link
Contributor

I'm general OK with this PR.

@jingzhang36 leave this to your LGTM

@rmgogogo
Copy link
Contributor

I am rather new to Golang (learning through opensource) and dont now how to fix this issue with currently breaks the tests.

Known dependencies are:
	github.com/kubeflow/pipelines/backend/src/crd/pkg/apis/scheduledworkflow/v1beta1
	github.com/argoproj/argo/pkg/apis/workflow/v1alpha1
	github.com/ghodss/yaml
	github.com/stretchr/testify/assert
	k8s.io/apimachinery/pkg/api/errors
	k8s.io/apimachinery/pkg/apis/meta/v1
	k8s.io/apimachinery/pkg/runtime/schema
	k8s.io/apimachinery/pkg/types
	k8s.io/kubernetes/pkg/apis/core
	google.golang.org/grpc/codes
	github.com/kubeflow/pipelines/backend/api/go_client
	github.com/kubeflow/pipelines/backend/src/crd/pkg/apis/scheduledworkflow
	github.com/cenkalti/backoff
	github.com/go-openapi/runtime
	github.com/go-openapi/strfmt
	github.com/golang/glog
	github.com/google/uuid
	github.com/pkg/errors
	github.com/golang/protobuf/ptypes/timestamp
	k8s.io/apimachinery/pkg/util/json
	k8s.io/client-go/kubernetes
	k8s.io/client-go/rest
	k8s.io/client-go/tools/clientcmd
	google.golang.org/grpc
	google.golang.org/grpc/status
Check that imports in Go sources match importpath attributes in deps.

I have added the dependency to the go.mod but it dont seem to help ... :( Is there anyone that could give some advice on how to solve this/where I can read up to solve it. @IronPan, @mgogogo, @Ark-kun Thanks for the help!

How you update go.mod? Is it via manual?

Here is the suggested steps:

https://github.com/kubeflow/pipelines/tree/master/backend#updating-build-files

It's true that we didn't update it for a long time.

@jingzhang36
Copy link
Contributor

@NikeNano could you add the new dependency on github.com/argoproj/argo/workflow/validate to BUILD.bazel in this directory?

})

func setMetadata(wf *v1alpha1.Workflow) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: addRuntimeWorkflowMetadata? to make tests easier to understand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can omit workflow, addRuntimeMetadata? WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set it directly on the test workflow and drop the method to clean up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NikeNano, note the current change is against the goal we want to test. In real world, the workflows passed from KFP compiler do not have these metadata annotations. Therefore, always adding them beforehand makes tests deviate from the real world.

I understand we all want tests to be simpler, so they can be more readable. I think a better way to do this is to stop comparing all fields of a workflow in tests. Instead, we can reset fields we do not care about in some tests to empty before comparing with expected workflow. e.g. for this case, we can simply set annotations and labels to empty, and do not include them in the expected workflow.

Note that, for tests that specifically verify labels & annotations are correct, we still keep them to test what we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, always adding them beforehand makes tests deviate from the real world.

Yes good point, will have to find another solution.

I see your point, the tricky part here is that the WorkflowRuntimeManifest is handled as a string. Which means we need to unmarshal it to a *v1alpha1.Workflow again, remove the metadata and convert to a string again and then set it on the runDetails. I think this adds more complexity than just adding it atm.

@@ -208,6 +219,19 @@ func createPipeline(name string) *model.Pipeline {
}}
}

func createRunExpectedWorkflow(params []v1alpha1.Parameter, labels map[string]string, annotations map[string]string, serviceAccount string) *v1alpha1.Workflow {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this @Bobgy should make tests a bit cleaner? We still do the comparison with the metadata but as I mentioned here I think it might be easier for the reader if it is included in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is better than the original addMetadata version? I feel like that's simpler, because it sticks to workflow struct in interface.

I only had concerns regarding naming. Current implementation is good in a few minor points too, so overall I think the following are good:

  1. The method returns a new workflow via deepcopy (so that the method does not mutate its inputs)
  2. The method name makes tests readable -- we can tell it adds some metadata that is present on runtime workflows.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel sorry you had to change the entire test file multiple times, I think we can save some intermediate efforts by commenting ideas and discuss in comments, then finish the code when we agree with each other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote to use createRunExpectedWorkflow since all the tests related to CreateRun have the same similar set up required where the metadata is a subset only. I don't have any strong feelings for either of the solutions, but maybe bringing back the addMetadata is easiest and then have a separate look on the tests to improve the readability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with the current createRunExpectedWorkflow interface is that, there are a fixed number of arguments.
Whenever any test calls it, it has to supply all the arguments. However, if we will need to add any new arguments in the future, we'll have to update every test that calls createRunExpectedWorkflow.

Also, it creates another layer of abstraction over directly creating workflow structs, but the new abstraction doesn't really provide much value, it only adds metadata by default.

Compared to this, the original addMetadata is a lot simpler. I was only asking for a more descriptive name.

Therefore, my proposal would be:

func withCreateRunMetadata(*v1alpha1.Workflow workflow) *v1alpha1.Workflow {
workflow.DeepCopy()
...

(you can adjust the name if you have better ideas)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NikeNano see above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, see your point that it is a simpler interface, will update.

Copy link
Contributor

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!
/lgtm

There's only one minor comment left.

@@ -208,6 +219,14 @@ func createPipeline(name string) *model.Pipeline {
}}
}

func withCreateRunMetadata(wf *v1alpha1.Workflow) *v1alpha1.Workflow {
template := wf.Spec.Templates[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: deepCopy the workflow before changing it?
You are now editing the input wf.

Another option is to not add the return value, and make it explicit this method changes input.

It's up-to-you to decide.

Copy link
Member Author

@NikeNano NikeNano Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went back to the addRuntimeMetadata and add the metadata to the input workflow, the circle is complete :)

Comment on lines 204 to 205
template.Metadata.Annotations = map[string]string{"sidecar.istio.io/inject": "false"}
template.Metadata.Labels = map[string]string{"pipelines.kubeflow.org/cache_enabled": "true"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking before I change too much again. In order to reuse the addRuntimeMetadata it needs to live outside of the test file(or be duplicated). What is best practice in this case? Should we just add test_util.go in order to make it exported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we can add a test_util.go

@Bobgy
Copy link
Contributor

Bobgy commented Jun 14, 2021

/lgtm
/approve
Thank you so much for this great addition to KFP and also your perseverance while implementing and adapting the PR!
Really appreciate it

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot google-oss-robot merged commit a44a225 into kubeflow:master Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backend - Reject invalid pipelines
9 participants