-
Notifications
You must be signed in to change notification settings - Fork 4.6k
AI Example model serving tensorflow #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
c3739c6
Create AI Example model serving tensorflow
jayeshmahajan 3a7b971
ai/model-serving-tensorflow service.yaml
jayeshmahajan 0a3ef11
ai/model-serving-tensorflow ingress.yaml
jayeshmahajan 38f5cbf
ai/model-serving-tensorflow pv.yaml
jayeshmahajan 5ef3406
ai/model-serving-tensorflow pvc.yaml
jayeshmahajan 7dcfcdc
Create Readme.md
jayeshmahajan ea6121e
Rename Readme.md to README.md
jayeshmahajan 01cf1c4
Update with structure format for README.md
jayeshmahajan ef286fb
Correct link for serving in ai/model-serving-tensorflow/README.md
jayeshmahajan 774d5a8
Fix kubectl README.md
jayeshmahajan e6f8abd
Update README.md
jayeshmahajan 61dd508
Update as per comments README.md
jayeshmahajan 267d2cd
Update tensorflow/serving:2.19.0 deployment.yaml
jayeshmahajan e4acc78
remove hostname ai/model-serving-tensorflow/ingress.yaml
jayeshmahajan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
# TensorFlow Model Serving on Kubernetes | ||
|
||
## 1 Purpose / What You'll Learn | ||
|
||
This example demonstrates how to deploy a TensorFlow model for inference using [TensorFlow Serving](https://www.tensorflow.org/serving) on Kubernetes. You’ll learn how to: | ||
|
||
- Set up TensorFlow Serving with a pre-trained model | ||
- Use a PersistentVolume to mount your model directory | ||
- Expose the inference endpoint using a Kubernetes `Service` and `Ingress` | ||
- Send a sample prediction request to the model | ||
|
||
--- | ||
|
||
## 📚 Table of Contents | ||
|
||
- [Prerequisites](#prerequisites) | ||
- [Quick Start / TL;DR](#quick-start--tldr) | ||
- [Detailed Steps & Explanation](#detailed-steps--explanation) | ||
- [Verification / Seeing it Work](#verification--seeing-it-work) | ||
- [Configuration Customization](#configuration-customization) | ||
- [Cleanup](#cleanup) | ||
- [Further Reading / Next Steps](#further-reading--next-steps) | ||
|
||
--- | ||
|
||
## ⚙️ Prerequisites | ||
|
||
- Kubernetes cluster (tested with v1.29+) | ||
- `kubectl` configured | ||
- Optional: `ingress-nginx` for external access | ||
- x86-based machine (for running TensorFlow Serving image) | ||
- Local hostPath support (for demo) or a cloud-based PVC | ||
|
||
--- | ||
|
||
## ⚡ Quick Start / TL;DR | ||
|
||
```bash | ||
|
||
# Apply manifests | ||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml | ||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml | ||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml | ||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml | ||
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional | ||
``` | ||
|
||
--- | ||
|
||
## 2. Expose the Servic | ||
|
||
### 1. PersistentVolume & PVC Setup | ||
|
||
> ⚠️ Note: For local testing, `hostPath` is used to mount `/mnt/models/my_model`. In production, replace this with a cloud-native storage backend (e.g., AWS EBS, GCP PD, or NFS). | ||
|
||
|
||
Model folder structure: | ||
``` | ||
/mnt/models/my_model/ | ||
└── 1/ | ||
├── saved_model.pb | ||
└── variables/ | ||
``` | ||
|
||
--- | ||
|
||
### 2. Expose the Service | ||
|
||
- A `ClusterIP` service exposes gRPC (8500) and REST (8501). | ||
- An optional `Ingress` exposes `/tf/v1/models/my_model:predict` to external clients. | ||
|
||
Update the `host` value in `ingress.yaml` to match your domain. | ||
|
||
--- | ||
|
||
## 3 Verification / Seeing it Work | ||
|
||
If using ingress: | ||
|
||
```bash | ||
curl -X POST http://<ingress-host>/tf/v1/models/my_model:predict \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ "instances": [[1.0, 2.0, 5.0]] }' | ||
``` | ||
|
||
Expected output: | ||
|
||
```json | ||
{ | ||
"predictions": [...] | ||
} | ||
``` | ||
|
||
To verify the pod is running: | ||
|
||
```bash | ||
kubectl get pods | ||
kubectl wait --for=condition=Available deployment/tf-serving --timeout=300s | ||
kubectl logs deployment/tf-serving | ||
``` | ||
|
||
--- | ||
|
||
## 🛠️ Configuration Customization | ||
|
||
- Update `model_name` and `model_base_path` in the deployment | ||
- Replace `hostPath` with `PersistentVolumeClaim` bound to cloud storage | ||
- Modify resource requests/limits for TensorFlow container | ||
|
||
--- | ||
|
||
## 🧹 Cleanup | ||
|
||
```bash | ||
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional | ||
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml | ||
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml | ||
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml | ||
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml | ||
|
||
``` | ||
|
||
--- | ||
|
||
## 4 Further Reading / Next Steps | ||
|
||
- [TensorFlow Serving](https://www.tensorflow.org/tfx/serving) | ||
- [TF Serving REST API Reference](https://www.tensorflow.org/tfx/serving/api_rest) | ||
- [Kubernetes Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) | ||
- [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: tf-serving | ||
labels: | ||
app: tf-serving | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: tf-serving | ||
template: | ||
metadata: | ||
labels: | ||
app: tf-serving | ||
spec: | ||
containers: | ||
- name: tensorflow-serving | ||
image: tensorflow/serving:2.19.0 | ||
args: | ||
- "--model_name=my_model" | ||
- "--port=8500" | ||
- "--rest_api_port=8501" | ||
- "--model_base_path=/models/my_model" | ||
ports: | ||
- containerPort: 8500 # gRPC | ||
- containerPort: 8501 # REST | ||
volumeMounts: | ||
- name: model-volume | ||
mountPath: /models/my_model | ||
volumes: | ||
- name: model-volume | ||
persistentVolumeClaim: | ||
claimName: my-model-pvc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
apiVersion: networking.k8s.io/v1 | ||
kind: Ingress | ||
metadata: | ||
name: tf-serving-ingress | ||
annotations: | ||
nginx.ingress.kubernetes.io/rewrite-target: /$2 | ||
spec: | ||
rules: | ||
- http: | ||
paths: | ||
- path: /tf(/|$)(.*) | ||
pathType: Prefix | ||
backend: | ||
service: | ||
name: tf-serving | ||
port: | ||
number: 8501 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
apiVersion: v1 | ||
kind: PersistentVolume | ||
metadata: | ||
name: my-model-pv | ||
spec: | ||
capacity: | ||
storage: 1Gi | ||
accessModes: | ||
- ReadOnlyMany | ||
persistentVolumeReclaimPolicy: Retain | ||
hostPath: | ||
path: /mnt/models/my_model |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
apiVersion: v1 | ||
kind: PersistentVolumeClaim | ||
metadata: | ||
name: my-model-pvc | ||
spec: | ||
accessModes: | ||
- ReadOnlyMany | ||
resources: | ||
requests: | ||
storage: 1Gi | ||
volumeName: my-model-pv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: tf-serving | ||
spec: | ||
selector: | ||
app: tf-serving | ||
ports: | ||
- name: grpc | ||
port: 8500 | ||
targetPort: 8500 | ||
- name: rest | ||
port: 8501 | ||
targetPort: 8501 | ||
type: ClusterIP |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.