-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add necessary data types to api and database to support pipeline version. #1873
Conversation
on Yang's branch at https://github.com/IronPan/pipelines/tree/kfpci/. Backward compatible.
/assign @IronPan |
/retest |
/assign @IronPan |
string version_id = 2; | ||
} | ||
|
||
message ListPipelineVersionsRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consider adding pagination for list version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
backend/api/pipeline.proto
Outdated
repeated Parameter parameters = 4; | ||
|
||
// Input. | ||
oneof code_source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should they be modeled as oneof?
the code source is where the source code is located. the url is where the compiled package stored. they are not mutually exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move url out of code source then. Another note: if we have both source and package as optional fields, there is a possibility that the two are inconsistent with each other (and we won't validate the provided source and package when those get set, since the validation work is not so trivial and probably not worthwhile). In case of that, we probably need to make one having priority over the other...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we allow both code source and package source to be present at the same time. Should we also allow multiple code sources? So we don't need oneof in CodeSource type.
type PipelineVersion struct { | ||
UUID string `gorm:"column:VersionUUID; not null; primary_key"` | ||
CreatedAtInSec int64 `gorm:"column:VersionCreatedAtInSec; not null"` | ||
Name string `gorm:"column:VersionName; not null"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we've discussed about this. the name of the version should be unique for a given pipeline for caching purpose. can we enforce it through db schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we did. I missed it in the change..... added "unique"
RepoName string `gorm:"column:RepoName"` | ||
CommitSHA string `gorm:"column:CommitSHA"` | ||
|
||
URL string `gorm:"column:URL` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as api definition. url doesn't belong to code source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
type CodeSource struct { | ||
// PipelineVersion can be based on a git commit or a package stored at some | ||
// url. | ||
// All fields below are optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for storing oneof fields in DB should we instead use just two columns? code source type and the streamed json of the codesource?
This will make DB schema much cleaner and extensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get rid of the oneof from api type definition. So kept 3 cols here for now. Please let me if we still want 2 columns instead: one is the combination of repo name and commit; the other is url.
// Set size to 65535 so it will be stored as longtext. | ||
// https://dev.mysql.com/doc/refman/8.0/en/column-count-limit.html | ||
Parameters string `gorm:"column:VersionParameters; not null; size:65535"` | ||
PipelineId string `gorm:"column:PipelineId; not null"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider add foreign key for cascade delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/assign @IronPan |
@@ -136,6 +137,32 @@ message GetTemplateResponse { | |||
string template = 1; | |||
} | |||
|
|||
message GetPipelineVersionTemplateRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to add API definition in a separate change?
We should use resource reference for flexible pipeline <-> version mapping
https://github.com/kubeflow/pipelines/blob/master/backend/api/resource_reference.proto
Please check experiment <-> job and experiment <-> run as examples to see how this is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Api PipelineVersion definition will use resource reference to point to Pipeline. Model/Storage PipelineVersion will use foreign key of pipeline id.
/assign @IronPan |
backend/api/pipeline.proto
Outdated
@@ -136,6 +138,32 @@ message GetTemplateResponse { | |||
string template = 1; | |||
} | |||
|
|||
message GetPipelineVersionTemplateRequest{ | |||
string pipeline_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline id is not necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
backend/api/pipeline.proto
Outdated
} | ||
|
||
message CreatePipelineVersionRequest { | ||
string pipeline_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ResourceReference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
backend/api/pipeline.proto
Outdated
} | ||
|
||
message GetPipelineVersionRequest { | ||
string pipeline_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline ID is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
backend/api/pipeline.proto
Outdated
} | ||
|
||
message ListPipelineVersionsRequest { | ||
string pipeline_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ResourceKey instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
backend/api/pipeline.proto
Outdated
repeated Parameter parameters = 4; | ||
|
||
// Input. Optional. Pipeline version code sources. | ||
repeated string code_source_urls = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be repeated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow multiple sources. In api, I use repeated field and in db, I used ";" separated string for multiple sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should code always be single source of truth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to summarize: (1) in storage/model, we have a single code source url (2) in api, we'll have to have a package url and a code source url in PipelineVersion message, because the package url we need to read file from for CreatePipelineVersionRequest. But the package url is not stored in db, so not in storage/model definition...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I changed to a single code source url and a single package url in API, and a single code source in model.
_, err = s.db.Exec(sql, args...) | ||
sqlPipelineVersions, argsPipelineVersions, err := sq. | ||
Insert("pipeline_versions"). | ||
SetMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
newPipeline := *p | ||
now := s.time.Now().Unix() | ||
newPipeline.CreatedAtInSec = now | ||
newPipeline.DefaultVersion.CreatedAtInSec = now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaultversion is not guarantee to exist. this might cause nil pointer exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check and if it's nil, create an implicit one.
Note:
(1) in the final release, CreatePipeline in pipline_store shouldn't create version. But in the intermediate status, CreatePipeline has to create an implicit version with pipeline
(2) in the intermediate status, if CreatePipeline is called from resource_manager, this field shall not be nil, since I the pipeline struct constructed there contains default version.
ToSql() | ||
if err != nil { | ||
return nil, util.NewInternalServerError(err, "Failed to create query to insert pipeline to pipeline table: %v", | ||
err.Error()) | ||
} | ||
_, err = s.db.Exec(sql, args...) | ||
sqlPipelineVersions, argsPipelineVersions, err := sq. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if pipeline doesnt have default version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the final release, the default version in CreatePipeline shall be nil. But here, CreatePipeline will create an implicit version without the request specifying it.
id, err := s.uuid.NewRandom() | ||
if err != nil { | ||
return nil, util.NewInternalServerError(err, "Failed to create a pipeline id.") | ||
} | ||
newPipeline.UUID = id.String() | ||
newPipeline.DefaultVersion.PipelineId = id.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you split the code to store a new pipeline and a new version? not mingle them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to be in a single transaction anyway? A further note: in the final release, we wouldn't insert/create versions here at all. CreatePipeline will only handle pipeline (as a container/bag) and has no insertion/creation of versions when everything is in its final status.
@@ -17,6 +17,7 @@ package storage | |||
import ( | |||
"database/sql" | |||
"fmt" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line is not added manually. Auto formatting inserted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/cc @james-jwu FYI |
@jingzhang36: GitHub didn't allow me to request PR reviews from the following users: FYI. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @IronPan |
…o make them more separate.
…lineVersionRequest
/retest |
/test kubeflow-pipeline-sample-test |
/assign @IronPan |
// Input. Optional. Pipeline version code source. | ||
string code_source_url = 5; | ||
|
||
// Input. Required. Pipeline version package url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be optional. Same as the Url in here
https://github.com/kubeflow/pipelines/pull/1873/files#diff-84b54249a5b54df88b95c2eed5c3f7c3R187
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: IronPan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Mostly based on Yang's branch at https://github.com/IronPan/pipelines/tree/kfpci/.
In addition to the unit tests and integration tests, I tested on my kfp deployment,
(1) all runs and pipelines from before the upgrade survive the upgrade and function as usual (e.g., view, delete)
(2) new pipelines can be added properly after upgrade and runs (one time and recurring) are created on these pipelines. Then deletion of these pipelines behave as usual too.
(3) all deletion of pipelines lead to the removal of versions as well.
(4) create new experiment, and create run under this experiment
This change is