Skip to content

Commit

Permalink
Feat: adds huggingface bloom example to torchserve (kubeflow#2466)
Browse files Browse the repository at this point in the history
* Feat: add huggingface bloom example to torchserve

Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com>

* fix: add pv and helper pod yaml

 - update readme doc
 - add storageclass yaml
 - fix config.properties

Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com>

* fix: add steps for model sharding
  • Loading branch information
Jagadeesh J authored Mar 4, 2023
1 parent aed4d5b commit 38465ea
Show file tree
Hide file tree
Showing 6 changed files with 255 additions and 0 deletions.
187 changes: 187 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# TorchServe example with Huggingface BLOOM model
In this example we will show how to serve [Large Huggingface models with TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels)
on KServe.

## Model archive file creation

Clone [pytorch/serve](https://github.com/pytorch/serve) repository,
navigate to `examples/Huggingface_Largemodels` and follow the steps for creating the MAR file including serialized model and other dependent files.

The above Torchserve example works on shard version of Huggingface models.

For sharding the Huggingface models you can use the following script, and then [compress the model](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels#step-2-compress-downloaded-model)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name="bigscience/bloomz-7b1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.save_pretrained("model/"+model_name, max_shard_size="5GB")
tokenizer.save_pretrained("model/"+model_name)
```

## Create NVMe Persistent Volume

Use SSH to connect to the worker nodes and prepare the NVMe drives for Kubernetes, as follows.

Run the `lsblk` command on each worker node to lists the available disks.

```bash
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 116.4G 0 disk
nvme0n1 259:1 0 80G 0 disk
├─nvme0n1p1 259:2 0 80G 0 part /
└─nvme0n1p128 259:3 0 1M 0 part
```

Output of the command line that shows the result of running the lsblk command.


```bash
$ sudo mkfs.xfs /dev/nvme1n1
```

```bash
$ sudo mkdir -p /mnt/data/vol1
$ sudo chmod -R 777 /mnt
$ sudo mount /dev/nvme1n1 /mnt/data/vol1
```

Permanently mount the device:

```bash
$ sudo blkid /dev/nvme1n1
```

To get it to mount every time, add the following line to the /etc/fstab file:

```bash
UUID=nvme_UUID /mnt/data/vol1 xfs defaults,nofail 0 2
```

Clone the local provisioner repository:

```bash
$ git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git
```

Create a StorageClass yaml file `storageclass.yaml`

```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast-disks
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
```
```bash
$ kubectl apply -f storageclass.yaml
```

Create Local Persistent Volumes for Kubernetes

- Change `hostDir` to the mount path

```bash
cd sig-storage-local-static-provisioner
helm template ./helm/provisioner > ./provisioner/deployment/kubernetes/provisioner_generated.yaml

kubectl apply -f ./deployment/kubernetes/provisioner_generated.yaml
```

Output of the command line that shows the result of running the kubectl get pods command.

```bash
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-5c5c4d8c89-lrzbd 2/2 Running 0 4d2h
local-nvme-pv-provisioner-vwxgt 1/1 Running 0 16m
```

```bash
$ kubectl get pv

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-2a85b8ac 116Gi RWO Delete Bound kserve-test/model-cache fast-disks 4d3h
```
## Create PVC and Mount the Model.

```bash
kubectl apply -f pvc-pod.yaml
```

Refer: [Bloom Model Example](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels)

Move the config.properties and MAR file to PVC in the below structure.

```bash
|_model-store
|_bloom-560m.mar
|_config
|_config.properties
```

```bash
kubectl exec -it pv-pod -- mkdir /pv/config
kubectl exec -it pv-pod -- mkdir /pv/model-store

kubectl cp config.properties pv-pod:/pv/config
kubectl cp bloom-560m.mar -it pv-pod:/pv/config
```

## Create the InferenceService

Apply the CRD

```bash
kubectl apply -f bloom-560m.yaml
```

Expected Output

```bash
$inferenceservice.serving.kserve.io/torchserve-bloom-560m created
```

## Run a prediction

The first step is to [determine the ingress IP and ports](https://kserve.github.io/website/0.10/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

```bash
MODEL_NAME=BLOOMSeqClassification
ISVC_NAME=torchserve-bloom-560m
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${ISVC_NAME} -n <namespace> -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d ./sample_text.txt
```

Expected Output

```bash
* Trying 44.239.20.204...
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (44.239.20.204) port 80 (#0)
> PUT /v1/models/BLOOMSeqClassification:predict HTTP/1.1
> Host: torchserve-bloom-560m.kserve-test.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 79
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< cache-control: no-cache; no-store, must-revalidate, private
< content-length: 8
< date: Wed, 04 Nov 2020 10:54:49 GMT
< expires: Thu, 01 Jan 1970 00:00:00 UTC
< pragma: no-cache
< x-request-id: 4b54d3ac-185f-444c-b344-b8a785fdeb50
< x-envoy-upstream-service-time: 2085
< server: istio-envoy
<
* Connection #0 to host torchserve-bloom-560m.kserve-test.example.com left intact
Accepted
```
**__Note__** For larger models use `A100 80g` GPU instances.
14 changes: 14 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/bloom-560m.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: "torchserve-bloom-560m"
spec:
predictor:
pytorch:
storageUri: pvc://model-cache
resources:
limits:
cpu: 4
memory: 16Gi
nvidia.com/gpu: 1
nodeName: ip-xxx-xxx-xxx-xxx.us-west-2.compute.internal
13 changes: 13 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"bloom":{"1.0":{"defaultVersion":true,"marName":"bloom-560m.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
28 changes: 28 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/pvc-pod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: model-local-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 700Gi
storageClassName: fast-disks
---
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
volumes:
- name: pv-storage
persistentVolumeClaim:
claimName: model-local-claim
containers:
- name: pv-container
image: alpine
volumeMounts:
- mountPath: "/pv"
name: pv-storage
7 changes: 7 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/sample_text.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"instances": [
{
"data": "My dog is cute"
}
]
}
6 changes: 6 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/bloom/storageclass.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast-disks
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

0 comments on commit 38465ea

Please sign in to comment.