Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add necessary data types to api and database to support pipeline version. #1873

Merged
merged 43 commits into from
Sep 26, 2019

Conversation

jingzhang36
Copy link
Contributor

@jingzhang36 jingzhang36 commented Aug 17, 2019

Mostly based on Yang's branch at https://github.com/IronPan/pipelines/tree/kfpci/.

  1. Backward compatible.
  2. Major intention is to add data types/fields and some util methods to access those new types/fields. API method setting/using those fields will be in another PR, in order to avoid huge PRs.

In addition to the unit tests and integration tests, I tested on my kfp deployment,
(1) all runs and pipelines from before the upgrade survive the upgrade and function as usual (e.g., view, delete)
(2) new pipelines can be added properly after upgrade and runs (one time and recurring) are created on these pipelines. Then deletion of these pipelines behave as usual too.
(3) all deletion of pipelines lead to the removal of versions as well.
(4) create new experiment, and create run under this experiment


This change is Reviewable

@jingzhang36 jingzhang36 changed the title Add necessary data types and proto messages for pipeline version. Add necessary data types to api and database to support pipeline version. Aug 19, 2019
@jingzhang36
Copy link
Contributor Author

/assign @IronPan
/assign @neuromage

@jingzhang36
Copy link
Contributor Author

/retest

@jingzhang36
Copy link
Contributor Author

/assign @IronPan
/assing @neuromage

string version_id = 2;
}

message ListPipelineVersionsRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider adding pagination for list version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

repeated Parameter parameters = 4;

// Input.
oneof code_source {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they be modeled as oneof?
the code source is where the source code is located. the url is where the compiled package stored. they are not mutually exclusive.

Copy link
Contributor Author

@jingzhang36 jingzhang36 Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move url out of code source then. Another note: if we have both source and package as optional fields, there is a possibility that the two are inconsistent with each other (and we won't validate the provided source and package when those get set, since the validation work is not so trivial and probably not worthwhile). In case of that, we probably need to make one having priority over the other...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we allow both code source and package source to be present at the same time. Should we also allow multiple code sources? So we don't need oneof in CodeSource type.

type PipelineVersion struct {
UUID string `gorm:"column:VersionUUID; not null; primary_key"`
CreatedAtInSec int64 `gorm:"column:VersionCreatedAtInSec; not null"`
Name string `gorm:"column:VersionName; not null"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we've discussed about this. the name of the version should be unique for a given pipeline for caching purpose. can we enforce it through db schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we did. I missed it in the change..... added "unique"

RepoName string `gorm:"column:RepoName"`
CommitSHA string `gorm:"column:CommitSHA"`

URL string `gorm:"column:URL`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as api definition. url doesn't belong to code source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

type CodeSource struct {
// PipelineVersion can be based on a git commit or a package stored at some
// url.
// All fields below are optional.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for storing oneof fields in DB should we instead use just two columns? code source type and the streamed json of the codesource?
This will make DB schema much cleaner and extensible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get rid of the oneof from api type definition. So kept 3 cols here for now. Please let me if we still want 2 columns instead: one is the combination of repo name and commit; the other is url.

// Set size to 65535 so it will be stored as longtext.
// https://dev.mysql.com/doc/refman/8.0/en/column-count-limit.html
Parameters string `gorm:"column:VersionParameters; not null; size:65535"`
PipelineId string `gorm:"column:PipelineId; not null"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider add foreign key for cascade delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jingzhang36
Copy link
Contributor Author

/assign @IronPan
/assign @neuromage

@@ -136,6 +137,32 @@ message GetTemplateResponse {
string template = 1;
}

message GetPipelineVersionTemplateRequest{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to add API definition in a separate change?

We should use resource reference for flexible pipeline <-> version mapping
https://github.com/kubeflow/pipelines/blob/master/backend/api/resource_reference.proto

Please check experiment <-> job and experiment <-> run as examples to see how this is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Api PipelineVersion definition will use resource reference to point to Pipeline. Model/Storage PipelineVersion will use foreign key of pipeline id.

@jingzhang36
Copy link
Contributor Author

/assign @IronPan

@@ -136,6 +138,32 @@ message GetTemplateResponse {
string template = 1;
}

message GetPipelineVersionTemplateRequest{
string pipeline_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline id is not necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

message CreatePipelineVersionRequest {
string pipeline_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ResourceReference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

message GetPipelineVersionRequest {
string pipeline_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline ID is not needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

message ListPipelineVersionsRequest {
string pipeline_id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ResourceKey instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

repeated Parameter parameters = 4;

// Input. Optional. Pipeline version code sources.
repeated string code_source_urls = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be repeated?

Copy link
Contributor Author

@jingzhang36 jingzhang36 Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow multiple sources. In api, I use repeated field and in db, I used ";" separated string for multiple sources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should code always be single source of truth?

Copy link
Contributor Author

@jingzhang36 jingzhang36 Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to summarize: (1) in storage/model, we have a single code source url (2) in api, we'll have to have a package url and a code source url in PipelineVersion message, because the package url we need to read file from for CreatePipelineVersionRequest. But the package url is not stored in db, so not in storage/model definition...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed to a single code source url and a single package url in API, and a single code source in model.

_, err = s.db.Exec(sql, args...)
sqlPipelineVersions, argsPipelineVersions, err := sq.
Insert("pipeline_versions").
SetMap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

newPipeline := *p
now := s.time.Now().Unix()
newPipeline.CreatedAtInSec = now
newPipeline.DefaultVersion.CreatedAtInSec = now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaultversion is not guarantee to exist. this might cause nil pointer exception.

Copy link
Contributor Author

@jingzhang36 jingzhang36 Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check and if it's nil, create an implicit one.
Note:
(1) in the final release, CreatePipeline in pipline_store shouldn't create version. But in the intermediate status, CreatePipeline has to create an implicit version with pipeline
(2) in the intermediate status, if CreatePipeline is called from resource_manager, this field shall not be nil, since I the pipeline struct constructed there contains default version.

ToSql()
if err != nil {
return nil, util.NewInternalServerError(err, "Failed to create query to insert pipeline to pipeline table: %v",
err.Error())
}
_, err = s.db.Exec(sql, args...)
sqlPipelineVersions, argsPipelineVersions, err := sq.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if pipeline doesnt have default version?

Copy link
Contributor Author

@jingzhang36 jingzhang36 Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the final release, the default version in CreatePipeline shall be nil. But here, CreatePipeline will create an implicit version without the request specifying it.

id, err := s.uuid.NewRandom()
if err != nil {
return nil, util.NewInternalServerError(err, "Failed to create a pipeline id.")
}
newPipeline.UUID = id.String()
newPipeline.DefaultVersion.PipelineId = id.String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you split the code to store a new pipeline and a new version? not mingle them together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be in a single transaction anyway? A further note: in the final release, we wouldn't insert/create versions here at all. CreatePipeline will only handle pipeline (as a container/bag) and has no insertion/creation of versions when everything is in its final status.

@@ -17,6 +17,7 @@ package storage
import (
"database/sql"
"fmt"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Copy link
Contributor Author

@jingzhang36 jingzhang36 Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line is not added manually. Auto formatting inserted it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jingzhang36
Copy link
Contributor Author

/cc @james-jwu FYI

@k8s-ci-robot
Copy link
Contributor

@jingzhang36: GitHub didn't allow me to request PR reviews from the following users: FYI.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @james-jwu FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jingzhang36
Copy link
Contributor Author

/assign @IronPan

@jingzhang36
Copy link
Contributor Author

/retest

@jingzhang36
Copy link
Contributor Author

/test kubeflow-pipeline-sample-test

@jingzhang36
Copy link
Contributor Author

/assign @IronPan

// Input. Optional. Pipeline version code source.
string code_source_url = 5;

// Input. Required. Pipeline version package url.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IronPan
Copy link
Member

IronPan commented Sep 26, 2019

/approve
Just a small comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IronPan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@IronPan
Copy link
Member

IronPan commented Sep 26, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 7aaecb1 into kubeflow:master Sep 26, 2019
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants