Skip to content

AI Example model serving tensorflow #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions ai/model-serving-tensorflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# TensorFlow Model Serving on Kubernetes

## 1 Purpose / What You'll Learn

This example demonstrates how to deploy a TensorFlow model for inference using [TensorFlow Serving](https://www.tensorflow.org/serving) on Kubernetes. You’ll learn how to:

- Set up TensorFlow Serving with a pre-trained model
- Use a PersistentVolume to mount your model directory
- Expose the inference endpoint using a Kubernetes `Service` and `Ingress`
- Send a sample prediction request to the model

---

## 📚 Table of Contents

- [Prerequisites](#prerequisites)
- [Quick Start / TL;DR](#quick-start--tldr)
- [Detailed Steps & Explanation](#detailed-steps--explanation)
- [Verification / Seeing it Work](#verification--seeing-it-work)
- [Configuration Customization](#configuration-customization)
- [Cleanup](#cleanup)
- [Further Reading / Next Steps](#further-reading--next-steps)

---

## ⚙️ Prerequisites

- Kubernetes cluster (tested with v1.29+)
- `kubectl` configured
- Optional: `ingress-nginx` for external access
- x86-based machine (for running TensorFlow Serving image)
- Local hostPath support (for demo) or a cloud-based PVC

---

## ⚡ Quick Start / TL;DR

```bash

# Apply manifests
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional
```

---

## 2. Expose the Servic

### 1. PersistentVolume & PVC Setup

> ⚠️ Note: For local testing, `hostPath` is used to mount `/mnt/models/my_model`. In production, replace this with a cloud-native storage backend (e.g., AWS EBS, GCP PD, or NFS).


Model folder structure:
```
/mnt/models/my_model/
└── 1/
├── saved_model.pb
└── variables/
```

---

### 2. Expose the Service

- A `ClusterIP` service exposes gRPC (8500) and REST (8501).
- An optional `Ingress` exposes `/tf/v1/models/my_model:predict` to external clients.

Update the `host` value in `ingress.yaml` to match your domain.

---

## 3 Verification / Seeing it Work

If using ingress:

```bash
curl -X POST http://<ingress-host>/tf/v1/models/my_model:predict \
-H "Content-Type: application/json" \
-d '{ "instances": [[1.0, 2.0, 5.0]] }'
```

Expected output:

```json
{
"predictions": [...]
}
```

To verify the pod is running:

```bash
kubectl get pods
kubectl wait --for=condition=Available deployment/tf-serving --timeout=300s
kubectl logs deployment/tf-serving
```

---

## 🛠️ Configuration Customization

- Update `model_name` and `model_base_path` in the deployment
- Replace `hostPath` with `PersistentVolumeClaim` bound to cloud storage
- Modify resource requests/limits for TensorFlow container

---

## 🧹 Cleanup

```bash
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/ingress.yaml # Optional
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/service.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/deployment.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pvc.yaml
kubectl delete -f https://raw.githubusercontent.com/kubernetes/examples/refs/heads/master/ai/model-serving-tensorflow/pv.yaml

```

---

## 4 Further Reading / Next Steps

- [TensorFlow Serving](https://www.tensorflow.org/tfx/serving)
- [TF Serving REST API Reference](https://www.tensorflow.org/tfx/serving/api_rest)
- [Kubernetes Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/)
- [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)


34 changes: 34 additions & 0 deletions ai/model-serving-tensorflow/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-serving
labels:
app: tf-serving
spec:
replicas: 1
selector:
matchLabels:
app: tf-serving
template:
metadata:
labels:
app: tf-serving
spec:
containers:
- name: tensorflow-serving
image: tensorflow/serving:2.19.0
args:
- "--model_name=my_model"
- "--port=8500"
- "--rest_api_port=8501"
- "--model_base_path=/models/my_model"
ports:
- containerPort: 8500 # gRPC
- containerPort: 8501 # REST
volumeMounts:
- name: model-volume
mountPath: /models/my_model
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: my-model-pvc
17 changes: 17 additions & 0 deletions ai/model-serving-tensorflow/ingress.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tf-serving-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
rules:
- http:
paths:
- path: /tf(/|$)(.*)
pathType: Prefix
backend:
service:
name: tf-serving
port:
number: 8501
12 changes: 12 additions & 0 deletions ai/model-serving-tensorflow/pv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-model-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
hostPath:
path: /mnt/models/my_model
11 changes: 11 additions & 0 deletions ai/model-serving-tensorflow/pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-model-pvc
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi
volumeName: my-model-pv
15 changes: 15 additions & 0 deletions ai/model-serving-tensorflow/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: tf-serving
spec:
selector:
app: tf-serving
ports:
- name: grpc
port: 8500
targetPort: 8500
- name: rest
port: 8501
targetPort: 8501
type: ClusterIP