Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pipeline cannot run bug when using marketplace managed storage #2341

Merged
merged 2 commits into from
Oct 16, 2019

Conversation

Bobgy
Copy link
Contributor

@Bobgy Bobgy commented Oct 9, 2019

Pipeline couldn't run because workflow-controller isn't configured to write output artifacts to the correct gcs bucket.

I also made gcs bucket a directly configurable parameter for marketplace deployment, because:

  • When users need to backup / delete gcp resources, they should be aware which bucket KFP was using.

Pros:

  • Clarity on which gcp resources KFP is using

Cons:

  • One extra field to configure when using marketplace

I think pros outweigh cons. WDYT?

/assign @numerology
/cc @jessiezcc
/cc @rmgogogo
/kind bug

Reminder note: when we bring back managed storage configs for marketplace, we need to add an extra field for typing gcs bucket name.


This change is Reviewable

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 9, 2019

@numerology Can you help me review?

@numerology
Copy link

@numerology Can you help me review?

Sure. Can you add the triaging issue for this bug. I am a bit of lack of context here. Is this related with the managed storage issue?

@numerology
Copy link

Shall we also update schema.yaml to reflect this change in our deploy page? Or you just want to enable such deployment through mpdev?

@jessiezcc
Copy link
Contributor

/lgtm

@numerology
Copy link

/lgtm
Otherwise

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 10, 2019

Shall we also update schema.yaml to reflect this change in our deploy page? Or you just want to enable such deployment through mpdev?

Just mpdev for this PR.
I'll do a separate PR for schema.yaml if we actually decide to bring back GCS + CloudSQL options.

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 10, 2019

Hi @IronPan @neuromage, can you review/approve?

value: '{{ printf "%s-%s" .Values.managedstorage.cloudsqlInstanceConnectionName .Values.managedstorage.databaseNamePrefix | replace ":" "-" | trunc 50 }}'
{{ else }}
value: '{{ printf "%s-%s" .Values.managedstorage.cloudsqlInstanceConnectionName .Release.Name | replace ":" "-" | trunc 50 }}'
{{ end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer split the change into two PR. You have two in this PR
(1) fix the issue (argo bucket not respect managed storage)
(2) customize gcsBucketname
For (2), I remember Yang mentioned the reason is that he want to sync the DB name and Object storage name.
Need his LGTM in case (2) introduces other issue we didn't aware.

If Yang is OK with (2), then this PR requires touch schema.yaml to let UI feed in the gcsBucketName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These are valid concerns.

I can also do just (1), but I didn't find a cleaner way than just duplicating the same {{if}} {{else}} in both places and make sure the complex logic for determining bucket name is the same.

It will be very helpful if you know other ways that can let me put the logic of determining bucket name in one place in helm chart, and then reference it in those sub charts that use it.

For (2), I wonder why we want to sync DB name and Object storage name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (2), Yang mentioned with me some reason before. One reason I remember is that when user reinstall KFP but based on existing storage, it can save user's efforts to remember the pair (DB name and Storage buckage name) considering DB contains index to buckage. If user didn't well remember the pair or mis-match the pair, the error is pretty hard to be caught and understood by user.

for (1), I think a better solution is that we don't handle the logic inside helm but do it in application layer which is easier to share code logics while keep yaml be simple. It might require some corresponding code changes I didn't look into yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation.

for (2), Yang mentioned with me some reason before. One reason I remember is that when user reinstall KFP but based on existing storage, it can save user's efforts to remember the pair (DB name and Storage buckage name) considering DB contains index to buckage. If user didn't well remember the pair or mis-match the pair, the error is pretty hard to be caught and understood by user.

The arguments sound reasonable. I think for clarity we should think about a step during marketplace install that we introduce to users which db/buckets are created for them.

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 10, 2019

/hold
because of @rmgogogo 's concerns

@@ -93,7 +93,7 @@ data:
artifactRepository:
{
s3: {
bucket: mlpipeline,
bucket: '{{ if .Values.managedstorage.enabled }}{{ .Values.managedstorage.gcsBucketName }}{{ else }}mlpipeline{{ end }}',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

@@ -624,11 +624,7 @@ spec:
# Following environment variables are only needed when using Cloud SQL and GCS.
{{ if .Values.managedstorage.enabled }}
- name: OBJECTSTORECONFIG_BUCKETNAME
{{ if .Values.managedstorage.databaseNamePrefix }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already aware, before this change, we don't allow user to specify the gcs bucket name.
The gcs bucket name is auto inferred from the cloud sql instance name. The reason for that is so the cloudsql and GCS data can alway be in sync. When user reinstall, the data can be recovered correctly.

Allowing user to specify gcs bucket name might break that assumption and causing unexpected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I agree with the concern. will change to not letting it be configurable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A fast fix is to copy the bucket-gen code here to argo configmap.(although it would lead to code dup. anyway to dedup the code would be sweet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about my approach to avoid code dupe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK if it works

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 12, 2019

I found a way to avoid the code duplication.
Documentation about tpl feature in helm chart: https://helm.sh/docs/developing_charts/#using-the-tpl-function

Tests on my new change

pipelines/manifests/gcp_marketplace$ helm template chart/kubeflow-pipelines/ --set-string managedstorage.enabled=true,managedstorage.cloudsqlInstanceConnectionName=cloudsql-connection-name | grep -C 3 -i bucket
    artifactRepository:
    {
        s3: {
            bucket: 'cloudsql-connection-name-release-name',
            keyPrefix: artifacts,
            endpoint: minio-service.default:9000,
            insecure: true,
--
              value: 'pipeline-runner'
            # Following environment variables are only needed when using Cloud SQL and GCS.
            
            - name: OBJECTSTORECONFIG_BUCKETNAME
              value: 'cloudsql-connection-name-release-name'
            - name: DBCONFIG_DBNAME
              
pipelines/manifests/gcp_marketplace$ helm template chart/kubeflow-pipelines/ --set-string managedstorage.enabled=true,managedstorage.cloudsqlInstanceConnectionName=cloudsql-connection-name,managedstorage.databaseNamePrefix=db-prefix | grep -C 3 -i bucket
    artifactRepository:
    {
        s3: {
            bucket: 'cloudsql-connection-name-db-prefix',
            keyPrefix: artifacts,
            endpoint: minio-service.default:9000,
            insecure: true,
--
              value: 'pipeline-runner'
            # Following environment variables are only needed when using Cloud SQL and GCS.
            
            - name: OBJECTSTORECONFIG_BUCKETNAME
              value: 'cloudsql-connection-name-db-prefix'
            - name: DBCONFIG_DBNAME

Copy link
Contributor

@rmgogogo rmgogogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@IronPan
Copy link
Member

IronPan commented Oct 15, 2019

/hold cancel
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IronPan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 16, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 49a50df into kubeflow:master Oct 16, 2019
@Bobgy Bobgy deleted the fix_gcp_managed_storage branch October 16, 2019 03:49
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
* Generate Inferencegraph apis (kubeflow#2226)

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Generate python models for inferencegraph (kubeflow#2226)

* Add inferencegraph models to python sdk
* Update python sdk README

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Add inferencegraph apis to python sdk (kubeflow#2226)

* Add create inferencegraph method
* Add delete inferencegraph method

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Add inferencegraph apis to python sdk (kubeflow#2226)

* Add inferencegraph ready method
* Add wait for inferencegraph ready method
* Add get inferencegraph method

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Add e2e test for inferencegraph (kubeflow#2226)

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Lint check fix

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>

* Fixed test errors

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants