Skip to content

Commit

Permalink
first
Browse files Browse the repository at this point in the history
  • Loading branch information
umruhaya committed Sep 18, 2023
0 parents commit aa8eb42
Show file tree
Hide file tree
Showing 3 changed files with 142 additions and 0 deletions.
24 changes: 24 additions & 0 deletions sdaas-sai-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Custom Machine Learning (ML) Pipeline using Kubeflow Pipelines (KFP) and Google Cloud AI Platform Pipelines

## Introduction
The following is a custom Machine Learning (ML) pipeline that trains and deploys a model using Google's Vertex AI platform. The pipeline consists of two major components:

## Training Job Component
This component trains a model using custom-defined parameters and data. It utilizes a docker container for the training job. It also updates a Firestore collection with job details, such as job ID, job name, and training status. If the training job fails, it updates the Firestore collection to reflect the "Failed" status. If the training job is successful, it updates the Firestore collection to reflect the "Completed" status.

## Model Deployment Job Component
This component deploys the trained model to an endpoint for serving predictions. It creates an endpoint, uploads the model to the endpoint, and deploys the model. It also updates the Firestore collection with model deployment details, such as model ID, endpoint name, and deployment status. If the deployment fails, it updates the Firestore collection to reflect the "Failed" status. If the deployment is successful, it updates the Firestore collection to reflect the "Completed" status.

## Pipeline Compilation and Execution
The pipeline is compiled into a YAML file, which is used to create a pipeline job in Vertex AI. The pipeline job runs asynchronously.

## Benefits
Overall, this pipeline encapsulates the end-to-end process of training and deploying a machine learning model, making it easier to manage and monitor the entire process. It also ensures that all the job details are stored in a Firestore collection, providing an easy way to track and manage the jobs.

## Pipeline Execution
In the main pipeline function, these two components are chained together, meaning the output of the training job (the trained model) is used as the input for the deployment job.

After defining the pipeline, the code compiles it into a YAML file. This file can then be used to create and run the pipeline on Vertex AI Pipelines.

## Summary
In summary, this code automates the process of training a model and deploying it as a service on AI Platform. It also keeps track of the status of these processes in a Firestore collection, which can be useful for monitoring and debugging.
Binary file added sdaas-sai-code.pdf
Binary file not shown.
118 changes: 118 additions & 0 deletions simplified-infrastructure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
## Our Inital Approach

### On Backend

Backend remotely starts a virtual machine with gpu that executes training code.

```javascript
// API hosted on https://our-backend-endpoint.dev/start-fine-tune
function finetuneRequestHandler() {
const subjectImage = userRequest.subjectImageLink

// create a VM with GPU allocated to run training code
const node = computeEngineAPI.createMachine({
acceleratorType: "gpu",
customContainer: "/link/to/custom-training-container-image"
parameters: {
subjectImage: "/gcs/bucket-name/image-folder"
}
})

//When job is done, free up the gpu node
node.delete()

// Call to database happens here for keeping track of the ml job status
Database.update({ status: "completed" })
}
```

### Client Side
The client-side is supposed to call the fine tune api after it has uploaded

It can be a web app or a mobile app:
#### Clients
- **Web App:** Javascript
- **IOS App:** Swift
- **Android App:** Java

#### IOS APP (Swift)
```swift
// Upload Images to Bucket
var bucketLink = UploadImageToBucket(images)
// Submit a backend request to a finetuning api
SubmitFineTuningRequest(
url: "https://our-backend-endpoint.dev/start-fine-tune",
imagesLink: bucketLink
)
```
or
#### Web APP (Javascript)
```javascript
// Upload Images to Bucket
const bucketLink = UploadImageToBucket(images)
// Submit a backend request to a finetuning api
axios.post(
"https://our-backend-endpoint.dev/start-fine-tune",
{ imagesLink: bucketLink }
)
```

At the end of the day, it does not matter what client-side language we are using, all that is required from the client side is to upload the images to somewhere and makes an HTTP call to the BFF (Backend For Frontend). The BFF is a cloud function which may be implemented in Javascript or Python.

This model allows us to provide a simple interface for the client-side application developer and abstract functionality. This also provides modularity as we are implementing the backend logic separately from the client-side application.

By separating the frontend (client-side) and backend (BFF) code, we can develop and maintain them independently. This allows for easier maintenance and scalability as changes in one component do not affect the other.

## Our Current Approach With Vertex AI

We develop a ml pipeline using `kubeflow` in python and deploy it on vertex ai

It looks something like this
```python
from kfp.dsl import compiler

def pipeline():

def trainingJob():
gcloud.ai.createJob({
"jobID": 123,
"gpu": "T4",
"customContainer": "/link/to/custom-training-container-image"
})
updateFireStoreDatabase({ "status": "completed" })

# using kubeflow dsl (domain-specific language), We create a Template in YAML format
yamlFile = compiler.Compile(pipeline)

# Then we take the yaml template file and deploy it to Vertex AI as a Pipeline
gcloud.ai.deployPipeline(yamlFile)
```

### BFF (Backend For Frontend) Code

We would recieve requests from our client-side application
On Fine tune requests from client, we will trigger our pipeline on Vertex AI using it's API

```javascript
// API hosted on https://our-backend-endpoint.dev/start-fine-tune
function finetuneRequestHandler() {
const subjectImage = userRequest.subjectImageLink

// Trigger the ML Pipeline which would run on Vertex AI
const node = vertexAiAPI.runPipeline({
pipelineID: "dreambooth-training-ml-pipeline",
parameters: {
subjectImage: "/gcs/bucket-name/image-folder"
}
})

// notify the client once the images are generated
notifyClient()
}
```

In this case, we are not dealing with database records or things like hyperparameter finetuning in the BFF because the training pipeline on vertexAI takes care of that.

### Client Side Implementation

The client side implementation should be the same as in our initial approach. The basic idea is that even if we make changes/development to our backend logic, client side code should not have to be modified. The exposed interface by BFF APIs for the client side should stay the same. Only the underlying logic/code would change.

0 comments on commit aa8eb42

Please sign in to comment.