Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK/Compiler - Fix s3 artifact key names #1451

Merged
merged 2 commits into from
Jun 7, 2019
Merged

SDK/Compiler - Fix s3 artifact key names #1451

merged 2 commits into from
Jun 7, 2019

Conversation

kvalev
Copy link
Contributor

@kvalev kvalev commented Jun 6, 2019

The current implementation generates an incorrect s3 output artifact snippet:

        s3:
          accessKeySecret:
            key: ''
            optional: true
          bucket: mlpipeline
          endpoint: minio-service.kubeflow:9000
          insecure: true
          key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-metrics.tgz
          secretKeySecret:
            key: ''
            optional: true

Instead it should be:

        s3:
          accessKeySecret:
            key: accesskey
            name: mlpipeline-minio-artifact
          bucket: mlpipeline
          endpoint: minio-service.kubeflow:9000
          insecure: true
          key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-metrics.tgz
          secretKeySecret:
            key: secretkey
            name: mlpipeline-minio-artifact

Related PR: #1064


This change is Reviewable

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@k8s-ci-robot
Copy link
Contributor

Hi @kvalev. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kvalev kvalev changed the title WIP Fix s3 artifact keys Fix s3 artifact keys Jun 6, 2019
@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Jun 6, 2019
@kvalev kvalev changed the title Fix s3 artifact keys Fix s3 artifact key names Jun 6, 2019
@kvalev kvalev marked this pull request as ready for review June 6, 2019 11:40
@kvalev
Copy link
Contributor Author

kvalev commented Jun 6, 2019

I signed it!

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 6, 2019

/cc @eterna2

@k8s-ci-robot
Copy link
Contributor

@Ark-kun: GitHub didn't allow me to request PR reviews from the following users: eterna2.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @eterna2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 6, 2019

/ok-to-test

@kvalev kvalev changed the title Fix s3 artifact key names SDK/Compiler - Fix s3 artifact key names Jun 6, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 6, 2019

/lgtm

@eterna2
Copy link
Contributor

eterna2 commented Jun 7, 2019

Hey @kvalev

Can u do me a flavor for one last check?

Can u update the compiler test data for artifact_location so that one of the key uses the dict instead of the k8s secretselector object?

Just to be sure. I got lazy the last time and missed this bug.

@eterna2
Copy link
Contributor

eterna2 commented Jun 7, 2019

I am trying to figure out why this was not caught by the compiler unit test?

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 7, 2019
@kvalev
Copy link
Contributor Author

kvalev commented Jun 7, 2019

Done @eterna2 . The bug was not caught, because the yaml template for the compiler test was wrong.

@eterna2
Copy link
Contributor

eterna2 commented Jun 7, 2019

@kvalev
ah icic. Thanks for the catch.

/Lgtm

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

I am trying to figure out why this was not caught by the compiler unit test?

The golden test output was wrong in the same way.

I remember that this part looked strange to me, but at that time I was busy and had no time to test that code path manually. I kind of assumed that the author would use their feature and thus test it.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

Nit: It's better to have refactorings and actual changes in separate PRs. Minimizing the PR diff is important for making the reviewing easier.

For example, in artifact_location_tests.py there are 7 change blocks, but only 1 real change. And _artifact_location.py does not have any real changes.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

/lgtm

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

@kvalev Can you please tell us the scenario where you need to use this feature?
Currently the best way to specify the artifact storage details is to modify the configmap in the cluster See https://github.com/argoproj/argo/blob/master/ARTIFACT_REPO.md#configure-the-default-artifact-repository. Then you do not need to put any storage-specific detail in your pipeline package.

We're planning to deprecate this feature in Pipelines once argoproj/argo-workflows#1350 is released in Argo 2.4. So we'd like to learn your use cases and to try steer you to more supported solutions.

@kvalev
Copy link
Contributor Author

kvalev commented Jun 7, 2019

Well, I just started working with kubeflow/argo, so I was not aware that it can be configured at cluster level. This would probably not work in my case anyway, as my kubeflow cluster is managed by a third party, so it would not be possible for me to reconfigure it. Having the option to specify the artifact storage myself would be highly desirable, especially considering that it is not always possible to run the latest and greatest kubeflow version. If this functionality would be deprecated or even removed from the SDK, it might not be possible to work with an older kubeflow installation and people might be at the mercy of their providers.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

my kubeflow cluster is managed by a third party,

Hmm. Interesting. Does your org have a centralized kubeflow (pipelines?) cluster, managed by other people in your org? Do you want to use something like personal S3 bucket for data storage? BTW, another area where you'll have problems is secrets that are needed to access GCP services. They're also taken from the standard Kubernetes secret volumes that are store in cluster.

The reason we want to deprecate this is so that the compiled pipeline can be shared between different cluster that might have different storage configurations. Also hard-coding service endpoints and secret names sounds bad.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 7, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8583465 into kubeflow:master Jun 7, 2019
@kvalev
Copy link
Contributor Author

kvalev commented Jun 7, 2019

Does your org have a centralized kubeflow (pipelines?) cluster, managed by other people in your org?

Yeah something like that.

Do you want to use something like personal S3 bucket for data storage?

Personal S3 bucket or perhaps maybe Artifactory, which seems to be supported by the underlying model.

BTW, another area where you'll have problems is secrets that are needed to access GCP services. They're also taken from the standard Kubernetes secret volumes that are store in cluster.

This is not an issue for on-premise solutions.

@kvalev kvalev deleted the kvalev/fix-artifact-keys branch June 7, 2019 06:20
@eterna2
Copy link
Contributor

eterna2 commented Jun 7, 2019

The golden test output was wrong in the same way.

I remember that this part looked strange to me, but at that time I was busy and had no time to test that code path manually. I kind of assumed that the author would use their feature and thus test it.

Yeah. Should have spotted it earlier. My cluster was running on a hacked version (which is working) and hadn't time to swapped over to the supposedly more properly coded one.

@metrofun
Copy link

metrofun commented Jun 11, 2019

Can you please bump the latest release version, to include this fix? cc @Ark-kun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants