-
Notifications
You must be signed in to change notification settings - Fork 756
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Conversion of OpenVaccine, Bulldozers, Titanic and Facial keypoint de…
…tection from kaggle to kfp (#995)
- Loading branch information
1 parent
b98a4bc
commit afab3b8
Showing
53 changed files
with
5,320 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,348 @@ | ||
# Objective | ||
|
||
This example is based on the Bluebook for bulldozers competition (https://www.kaggle.com/competitions/bluebook-for-bulldozers/overview). The objective of this exercise is to predict the sale price of bulldozers sold at auctions. | ||
|
||
## Environment | ||
|
||
This pipeline was tested using Kubeflow 1.4 and kfp 1.1.2 and x86-64 and ARM based system which includes all Intel and AMD based CPU's and M1/M2 series Macbooks. | ||
|
||
## Step 1: Setup Kubeflow as a Service | ||
|
||
- If you haven’t already, sign up (https://www.arrikto.com/kubeflow-as-a-service/) | ||
- Deploy Kubeflow | ||
|
||
## Step 2: Launch a Notebook Server | ||
|
||
- Bump memory to 2GB and vCPUs to 2 | ||
|
||
|
||
## Step 3: Clone the Project Repo to Your Notebook | ||
|
||
- (Kubeflow as a Service) Open up a terminal in the Notebook Server and git clone the kubeflow/examples repository | ||
``` | ||
git clone https://github.com/kubeflow/examples | ||
``` | ||
|
||
## Step 4: Setup DockerHub and Docker | ||
|
||
- If you haven’t already, sign up (https://hub.docker.com/) for DockerHub | ||
- If you haven’t already, install Docker Desktop (https://www.docker.com/products/docker-desktop/) locally OR install the Docker command line utility (https://docs.docker.com/get-docker/) and enter `sudo docker login` command in your terminal and log into Docker with your your DockerHub username and password | ||
|
||
|
||
## Step 5: Setup Kaggle | ||
|
||
- If you haven’t already done so, sign up (https://www.kaggle.com/) for Kaggle | ||
- (On Kaggle) Generate an API token (https://www.kaggle.com/docs/api) | ||
- (Kubeflow as a Service) Create a Kubernetes secret | ||
``` | ||
kubectl create secret generic kaggle-secret --from-literal=KAGGLE_USERNAME=<username> --from-literal=KAGGLE_KEY=<api_token> | ||
``` | ||
|
||
## Step 6: Install Git | ||
|
||
- (Locally) If you don’t have it already, install Git (https://github.com/git-guides/install-git) | ||
|
||
## Step 7: Clone the Project Repo Locally | ||
|
||
- (Locally) Git clone the `kubeflow/examples` repository | ||
``` | ||
git clone https://github.com/kubeflow/examples | ||
``` | ||
|
||
## Step 8: Create a PodDefault Resource | ||
|
||
- (Kubeflow as a Service) Navigate to the `bluebook-for-bulldozers-kaggle-competition directory` | ||
- Create a resource.yaml file | ||
|
||
resource.yaml: | ||
``` | ||
apiVersion: "kubeflow.org/v1alpha1" | ||
kind: PodDefault | ||
metadata: | ||
name: kaggle-access | ||
spec: | ||
selector: | ||
matchLabels: | ||
kaggle-secret: "true" | ||
desc: "kaggle-access" | ||
volumeMounts: | ||
- name: secret-volume | ||
mountPath: /secret/kaggle | ||
volumes: | ||
- name: secret-volume | ||
secret: | ||
secretName: kaggle-secret | ||
``` | ||
|
||
<img width="660" alt="image3" src="https://user-images.githubusercontent.com/17012391/177049116-b4186ec3-becb-40ea-973a-27bc52f90619.png"> | ||
|
||
- Apply resource.yaml using `kubectl apply -f resource.yaml` | ||
|
||
## Step 9: Explore the load-data directory | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/load-data` directory | ||
- Open up the `load.py` file | ||
- Note the code in this file that will perform the actions required in the “load-data” pipeline step | ||
|
||
<img width="209" alt="image7" src="https://user-images.githubusercontent.com/17012391/177049222-dab4c362-f06d-42ca-a07c-e61a0b301670.png"> | ||
|
||
## Step 10: Build the load Docker Image | ||
|
||
- (Locally) Navigate to the bluebook-for-bulldozers-kaggle-competition/pipeline-components/load-data directory | ||
- Build the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 . | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker build -t <docker_username>/<docker_imagename>:<tag> . | ||
``` | ||
|
||
## Step 11: Push the load Docker Image to DockerHub | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/load-data` directory | ||
- Push the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag>-amd64 | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag> | ||
``` | ||
|
||
## Step 12: Explore the preprocess directory | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/preprocess` directory | ||
- Open up the `preprocess.py` file | ||
- Note the code in this file that will perform the actions required in the “preprocess” pipeline step | ||
|
||
<img width="209" alt="image5" src="https://user-images.githubusercontent.com/17012391/177049651-623fb24a-69c0-4a28-bbe1-3a360719b9ce.png"> | ||
|
||
## Step 13: Build the preprocess Docker Image | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/preprocess` directory | ||
- Build the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 . | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker build -t <docker_username>/<docker_imagename>:<tag> . | ||
``` | ||
## Step 14: Push the preprocess Docker Image to DockerHub | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/preprocess` directory | ||
- Push the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag>-amd64 | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag> | ||
``` | ||
|
||
## Step 15: Explore the train directory | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/train` directory | ||
- Open up the train.py file | ||
- Note the code in this file that will perform the actions required in the “train” pipeline step | ||
|
||
|
||
![image2](https://user-images.githubusercontent.com/17012391/177051233-a32e87db-7771-4b5f-9afe-141063733262.png) | ||
|
||
## Step 16: Build the train Docker Image | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/train` directory | ||
- Build the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 . | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker build -t <docker_username>/<docker_imagename>:<tag> . | ||
``` | ||
|
||
## Step 17: Push the train Docker Image to DockerHub | ||
|
||
- (Locally) Navigate to the bluebook-for-bulldozers-kaggle-competition/pipeline-components/train directory | ||
- Push the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag>-amd64 | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag> | ||
``` | ||
|
||
## Step 18: Explore the test directory | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/test` directory | ||
- Open up the `test.py` file | ||
- Note the code in this file that will perform the actions required in the “test” pipeline step | ||
|
||
<img width="237" alt="image6" src="https://user-images.githubusercontent.com/17012391/177051308-31b27ad7-71f3-47af-a068-de381fe64b5b.png"> | ||
|
||
## Step 19: Build the test Docker Image | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/test` directory | ||
- Build the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker build --platform=linux/amd64 -t <docker_username>/<docker_imagename>:<tag>-amd64 . | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker build -t <docker_username>/<docker_imagename>:<tag> . | ||
``` | ||
## Step 20: Push the test Docker Image to DockerHub | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition/pipeline-components/test` directory | ||
- Push the Docker image if locally you are using arm64 (Apple M1) | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag>-amd64 | ||
``` | ||
- OR build the Docker image if locally you are using amd64 | ||
``` | ||
docker push <docker_username>/<docker_imagename>:<tag> | ||
``` | ||
|
||
## Step 21: Modify the blue-book-for-bulldozers-kfp.py file | ||
|
||
(Kubeflow as a Service) Navigate to the `bluebook-for-bulldozers-kaggle-competition` directory | ||
Update the `bluebook-for-bulldozers-kaggle-competition-kfp.py` with accurate Docker Image inputs | ||
|
||
``` | ||
return dsl.ContainerOp( | ||
name = 'load-data', | ||
image = '<dockerhub username>/<image name>:<tag>', | ||
—----- | ||
def PreProcess(comp1): | ||
return dsl.ContainerOp( | ||
name = 'preprocess', | ||
image = '<dockerhub username>/<image name>:<tag>', | ||
—----- | ||
def Train(comp2): | ||
return dsl.ContainerOp( | ||
name = 'train', | ||
image = '<dockerhub username>/<image name>:<tag>', | ||
—----- | ||
def Test(comp3): | ||
return dsl.ContainerOp( | ||
name = 'test', | ||
image = '<dockerhub username>/<image name>:<tag>', | ||
``` | ||
|
||
## Step 22: Generate a KFP Pipeline yaml File | ||
|
||
- (Locally) Navigate to the `bluebook-for-bulldozers-kaggle-competition` directory and delete the existing `blue-book-for-bulldozers-kaggle-competition-kfp.yaml` file | ||
- (Kubeflow as a Service) Navigate to the `bluebook-for-bulldozers-kaggle-competition` directory | ||
|
||
Build a python virtual environment: | ||
|
||
Step a) Update pip | ||
``` | ||
python3 -m pip install --upgrade pip | ||
``` | ||
|
||
Step b) Install virtualenv | ||
``` | ||
sudo pip3 install virtualenv | ||
``` | ||
|
||
Step c) Check the installed version of venv | ||
``` | ||
virtualenv --version | ||
``` | ||
|
||
Step d) Name your virtual enviornment as kfp | ||
``` | ||
virtualenv kfp | ||
``` | ||
|
||
Step e) Activate your venv. | ||
``` | ||
source kfp/bin/activate | ||
``` | ||
|
||
After this virtual environment will get activated. Now in our activated venv we need to install following packages: | ||
``` | ||
sudo apt-get update | ||
sudo apt-get upgrade | ||
sudo apt-get install -y git python3-pip | ||
python3 -m pip install kfp==1.1.2 | ||
``` | ||
|
||
After installing packages create the yaml file | ||
|
||
Inside venv point your terminal to a path which contains our kfp file to build pipeline (blue-book-for-bulldozers-kaggle-competition-kfp.py) and run these commands to generate a `yaml` file for the Pipeline: | ||
|
||
``` | ||
blue-book-for-bulldozers-kaggle-competition-kfp.py | ||
``` | ||
|
||
<img width="496" alt="Screenshot 2022-07-04 at 12 01 51 AM" src="https://user-images.githubusercontent.com/17012391/177052742-4dae4647-b51b-4f36-94b0-3114130c756b.png"> | ||
|
||
- Download the `bluebook-for-bulldozers-kaggle-competition.yaml` file that was created to your local `bluebook-for-bulldozers-kaggle-competition` directory | ||
|
||
## Step 23: Create an Experiment | ||
|
||
- (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Experiments (KFP) > Create Experiment view | ||
- Name the experiment and click Next | ||
- Click on Experiments (KFP) to view the experiment you just created | ||
|
||
## Step 24: Create a Pipeline | ||
|
||
- (Kubeflow as a Service) Within the Kubeflow Central Dashboard, navigate to the Pipelines > +Upload Pipeline view | ||
- Name the pipeline | ||
- Click on Upload a file | ||
- Upload the local bluebook-for-bulldozers-kaggle-competition.py.yaml file | ||
- Click Create | ||
|
||
Step 25: Create a Run | ||
|
||
- (Kubeflow as a Service) Click on Create Run in the view from the previous step | ||
- Choose the experiment we created in Step 23 | ||
- Click Start | ||
- Click on the run name to view the runtime execution graph | ||
|
||
<img width="285" alt="Screenshot 2022-07-04 at 12 04 43 AM" src="https://user-images.githubusercontent.com/17012391/177052835-d2cde097-7354-4cac-8876-7d89244864d3.png"> | ||
|
||
|
||
## Troubleshooting Tips: | ||
While running the pipeline as mentioned above you may come across this error: | ||
![kaggle-secret-error-01](https://user-images.githubusercontent.com/17012391/175290593-aac58d80-0d9f-47bd-bd20-46e6f5207210.PNG) | ||
|
||
errorlog: | ||
|
||
``` | ||
kaggle.rest.ApiException: (403) | ||
Reason: Forbidden | ||
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Thu, 23 Jun 2022 11:31:18 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=6817a347c75399a531148e19cad0aaeb; max-age=2626560; path=/, GCLB=CIGths3--ebbUg; path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'Vary': | ||
HTTP response body: b'{"code":403,"message":"You must accept this competition\\u0027s rules before you\\u0027ll be able to download files."}' | ||
``` | ||
This error occours for two reasons: | ||
- Your Kaggle account is not verified with your phone number. | ||
- Rules for this specific competitions are not accepted. | ||
|
||
Lets accept Rules of Bulldozers competition | ||
![kaggle-secret-error-02](https://user-images.githubusercontent.com/17012391/175291406-7a30e06d-fc05-44c3-b33c-bccd31b381bd.PNG) | ||
|
||
Click on "I Understand and Accept". After this you will be prompted to verify your account using your phone number: | ||
![kaggle-secret-error-03](https://user-images.githubusercontent.com/17012391/175291608-daad1a47-119a-4e47-b48b-4f878d65ddd7.PNG) | ||
|
||
Add your phone number and Kaggle will send the code to your number, enter this code and verify your account. ( Note: pipeline wont run if your Kaggle account is not verified ) | ||
|
||
## Success | ||
After the kaggle account is verified pipeline run is successful we will get the following: | ||
|
||
<img width="1380" alt="Screenshot 2022-06-10 at 12 04 48 AM" src="https://user-images.githubusercontent.com/17012391/172920115-ccefd333-c500-4577-afcb-8487c2208371.png"> | ||
|
||
|
Oops, something went wrong.