Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add third party license + source code to argo and minio images to comply with their license #2201

Merged
merged 12 commits into from
Oct 1, 2019

Conversation

Bobgy
Copy link
Contributor

@Bobgy Bobgy commented Sep 23, 2019

Prepare required OSS licenses in the image.

/assign @IronPan
/cc @dushyanthsc
/cc @james-jwu


This change is Reviewable

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 23, 2019

Notes

  • I will clean up and include scripts that generated the license files in a separate PR.
  • I'm not familiar with our release process, does my change looks good?

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 24, 2019

@IronPan is OOO. @james-jwu @dushyanthsc can you help me review this?

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 25, 2019

TODO:
Add source code of MPL dependencies to the image

  • argoexec
  • workflow-controller
  • minio

@Bobgy Bobgy changed the title Add concatenated third party license to argo and minio images Add third party license + source code to argo and minio images to comply with their license Sep 25, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 25, 2019

I think it might not be the best practice to re-release the external images every time someone commits to master. This happens many times per day.
I'm not sure there is a need to re-release the images with every KFP release which happens weekly.

It might be best to have a manual scrupt/cloudbuild file to copy+license a particular tagged image and this script can be run manually any time we want to upgrade to new versions.

We should probably use the same tags as the original images, so that our images just have an extra prefix: argo/argoexec:2.3.0 -> gcr.io/kubeflow-pipelines/argo/argoexec:2.3.0.

@rmgogogo
Copy link
Contributor

/lgtm

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

I think it might not be the best practice to re-release the external images every time someone commits to master. This happens many times per day.
I'm not sure there is a need to re-release the images with every KFP release which happens weekly.

It might be best to have a manual scrupt/cloudbuild file to copy+license a particular tagged image and this script can be run manually any time we want to upgrade to new versions.

We should probably use the same tags as the original images, so that our images just have an extra prefix: argo/argoexec:2.3.0 -> gcr.io/kubeflow-pipelines/argo/argoexec:2.3.0.

I totally agree, let me try to refactor like this.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 26, 2019
@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

/hold
Until I refactor

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

Plan

  1. Build a script that deploys argo and minio images with license + source code
  2. Run the script once against ml-pipeline-staging to build the images and then copy it to ml-pipeline gcr.
  3. Submit a PR that changes our manifests to point to gcr.io/ml-pipeline/minio:some-tag ... to use my newly built images
    ...
    Some time later, when we want to use a different version of argo/minio, we need to release the image once again manually.

@@ -9,8 +9,6 @@ bases:
- metadata

images:
- name: argoproj/workflow-controller
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no usage of argoproj/workflow-controller. So I simply removed this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty certain that the workflow controller is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are using gcr.io/ml-pipeline/workflow-controller, but not argoproj/workflow-controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty certain it's used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let me know where it is? I couldn't find it when I searched for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's already referencing the gcr image:

image: gcr.io/ml-pipeline/workflow-controller:v2.3.0

argoexec image is already there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I found too. Do you have any further concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that image have the required licenses? Is there some script or cloudbuild entry that can create that image when new Argo version comes out?

Copy link
Contributor Author

@Bobgy Bobgy Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcr.io/ml-pipeline/workflow-controller:v2.3.0 doesn't have license

gcr.io/ml-pipeline/workflow-controller:v2.3.0-license-compliance has license

Is there some script or cloudbuild entry that can create that image when new Argo version comes out?

That's a good point.

I have some dirty scripts for generating the licenses. I made this image from these scripts manually this time.

I think cleaning up these scripts is not a blocker for MKP launch. I will put another PR for it later.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

/hold cancel
I've addressed review comments. Now I'm confident with my approach. More reviews welcomed.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

Note that I've manually released these '-license-compliance' images on ml-pipeline, so tests in this PR are already running with my new images.

Explaining a little about why I named new tags with suffix with the suffix '-license-compliance' because I don't want to replace existing images. There are already people depending on it.

Next time, when we upgrade these 3rd party libraries to a new version, we can just get rid of this suffix.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

Asking to review third_party/README.md, Dockerfiles and manifests. I am least confident with these.

@dushyanthsc
Copy link
Contributor

My two cents on the image tagging. I think we should still have some notion of version, because not all images are just vanilla copy of open source image. metadata-envoy for e.g. changes the envoy.yaml and has things like port numbers, etc. which can probably change. So I am completely on-board with not building these images per commit and handle publishing manually, but feel that we should retain some versioning in tagging the image.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 26, 2019

but feel that we should retain some versioning in tagging the image.

We can add some suffixes. E.g. envoy:1.2.3-kfp

python:3.7.4
python:3.7.4-alpine
python:3.7.4-slim

etc

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 26, 2019

I agree with having suffix and keeping original tag name to easily let people know it is customized and also which underlying image it was built from.

For envoy, since we are building sth on top of it. Sounds like we can go with our own versioning too if you prefer. Images don't need all follow the same rule

@@ -141,10 +141,10 @@ steps:
waitFor: ['PullMetadataEnvoy']

- name: 'gcr.io/cloud-builders/docker'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move all these entries to a separate cloudbuild.yaml file in the /third-party directory so that it's not triggered with every release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we have these entries. They are copying already built images from gcr.io/ml-pipeline/... to gcr.io/ml-pipeline/google/pipelines/...:$TAG_NAME (released tag name).

But I couldn't find any usage of gcr.io/ml-pipeline/google/pipelines, our lite deployment only references gcr.io/ml-pipeline/image-name.

Can you explain what these are for? I didn't understand so I merely changed the tags to my new tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked https://github.com/kubeflow/manifests, they are also not using these. @IronPan do you know what these are for? Just meant as taking a snapshot of the release artifacts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bobgy no need to add license-compliance part in the tag here. As long as we're building these default ones with the source and license included, we're good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neuromage My reasoning of the extra license-compliance part: #2201 (comment)

It's for purpose of safe gradual release, because images like gcr.io/ml-pipeline/argoexec:v2.3.0 are already in use today. If I push a new image to the same tag, everyone will switch to that new image immediately.

Technically, there's no difference between previous and new images. New images also just passed our presubmit tests. So risk pushing new images under the same tag is small too. I think we can also choose to do that.

WDYT? I'd like to double check if you want me to push new images under the same tag? or would you want to suggest a simpler naming?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're distributing the images, we should just version them (since we technically rebuild these every release). Ideally, we would name them something like argoexec-v2.3.0-kfp:0.1.30 which can indicate that this is based of argo v2.3.0, but released under kfp with tag 0.1.30. I don't want to block this PR, but can we consider using this convention? This can be done in a future PR.

/cc @IronPan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argoexec-v2.3.0-kfp:0.1.30
Should we do that even if we only release a modified version of the image once per Argo release (not once per KFP release)?

Using different tags for the same image might be a bit confusing.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 30, 2019

/lgtm
Let's make another PR where we extract the CloudBuild entries into separate script or scripts which are only run when the new upstream image is released.

/cc @neuromage Do you have any remaining concerns about this PR?

@neuromage
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neuromage

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neuromage
Copy link
Contributor

@Bobgy please also see my comment on the naming scheme.

@k8s-ci-robot k8s-ci-robot merged commit 51f5a5a into kubeflow:master Oct 1, 2019
@Bobgy Bobgy deleted the 3rd_party_licenses branch October 1, 2019 09:57
@Bobgy
Copy link
Contributor Author

Bobgy commented Oct 1, 2019

Hi @Ark-kun and @neuromage, thanks for the review and approval.

I think there are some common misunderstandings I didn't clarify well.

  • The cloud build entries here pull images from gcr.io/ml-pipeline/xxx-image and retag it to gcr.io/ml-pipeline/google/xxx-image:TAG_NAME. They don't rebuild 3rd party images per release.
  • Image building for 3rd party images happen in scripts like this. I ran this script manually a few days ago to release images like gcr.io/ml-pipeline/argoexec:v2.3.0-license-compliance. It's an one time effort until we need to use a new version of an image.

Also, summarize about how these images were built:

  1. Collect dependencies + transitive dependencies in these images. (Each repo has a different way, needs to be done manually.)
  2. Run a script to crawl github repos of golang imports (Not all imports can be figured out by my script, needs manual help for <2% of libraries)
  3. Run a script to crawl github license info of these libraries. (Not all repos have github recognizable license, needs manual help for <2% of libraries)
  4. Crawl full text license files for all dependencies + source code for MPL dependencies.
  5. Build the above into an image and push it to gcr.io/ml-pipeline/xxx-image

Because everything before step 4 cannot be automated completely. I commited concatenated license file and a list of MPL dependencies + their source code download url into KFP repo to build 3rd party images just for the version we use.

Action item left: (I will work on after vacation) tidy up and document my scripts that did step 1~4, these steps need to be done each time we upgrade image version.

So... to answer some of the concerns:

Let's make another PR where we extract the CloudBuild entries into separate script.

I don't think these entries need to be extracted, because they don't rebuild images + they were like this before this change too. Although I don't fully understand why we need them, because manifests never reference them. Please help me understand more historic reasons for these entries if I missed anything here.

scripts which are only run when the new upstream image is released.

I've committed scripts I used to build current 3rd party images into the repo. [Mentioned above as Action Item] I need to include scripts that can be used when we upgrade libraries.

I think if we're distributing the images, we should just version them (since we technically rebuild these every release). Ideally, we would name them something like argoexec-v2.3.0-kfp:0.1.30 which can indicate that this is based of argo v2.3.0, but released under kfp with tag 0.1.30. I don't want to block this PR, but can we consider using this convention? This can be done in a future PR.

As explained above, we don't rebuild these images every release. We only need to build one image every time we use a new version of a 3rd party image.

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
Added a specific version for protobuf in requirements.txt

Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants