forked from kubeflow/pipelines
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feat: adds huggingface bloom example to torchserve (kubeflow#2466)
* Feat: add huggingface bloom example to torchserve Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com> * fix: add pv and helper pod yaml - update readme doc - add storageclass yaml - fix config.properties Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com> * fix: add steps for model sharding
- Loading branch information
Jagadeesh J
authored
Mar 4, 2023
1 parent
aed4d5b
commit 38465ea
Showing
6 changed files
with
255 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
# TorchServe example with Huggingface BLOOM model | ||
In this example we will show how to serve [Large Huggingface models with TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels) | ||
on KServe. | ||
|
||
## Model archive file creation | ||
|
||
Clone [pytorch/serve](https://github.com/pytorch/serve) repository, | ||
navigate to `examples/Huggingface_Largemodels` and follow the steps for creating the MAR file including serialized model and other dependent files. | ||
|
||
The above Torchserve example works on shard version of Huggingface models. | ||
|
||
For sharding the Huggingface models you can use the following script, and then [compress the model](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels#step-2-compress-downloaded-model) | ||
|
||
```python | ||
from transformers import AutoModelForCausalLM, AutoTokenizer | ||
model_name="bigscience/bloomz-7b1" | ||
model = AutoModelForCausalLM.from_pretrained(model_name) | ||
tokenizer = AutoTokenizer.from_pretrained(model_name) | ||
model.save_pretrained("model/"+model_name, max_shard_size="5GB") | ||
tokenizer.save_pretrained("model/"+model_name) | ||
``` | ||
|
||
## Create NVMe Persistent Volume | ||
|
||
Use SSH to connect to the worker nodes and prepare the NVMe drives for Kubernetes, as follows. | ||
|
||
Run the `lsblk` command on each worker node to lists the available disks. | ||
|
||
```bash | ||
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT | ||
nvme1n1 259:0 0 116.4G 0 disk | ||
nvme0n1 259:1 0 80G 0 disk | ||
├─nvme0n1p1 259:2 0 80G 0 part / | ||
└─nvme0n1p128 259:3 0 1M 0 part | ||
``` | ||
|
||
Output of the command line that shows the result of running the lsblk command. | ||
|
||
|
||
```bash | ||
$ sudo mkfs.xfs /dev/nvme1n1 | ||
``` | ||
|
||
```bash | ||
$ sudo mkdir -p /mnt/data/vol1 | ||
$ sudo chmod -R 777 /mnt | ||
$ sudo mount /dev/nvme1n1 /mnt/data/vol1 | ||
``` | ||
|
||
Permanently mount the device: | ||
|
||
```bash | ||
$ sudo blkid /dev/nvme1n1 | ||
``` | ||
|
||
To get it to mount every time, add the following line to the /etc/fstab file: | ||
|
||
```bash | ||
UUID=nvme_UUID /mnt/data/vol1 xfs defaults,nofail 0 2 | ||
``` | ||
|
||
Clone the local provisioner repository: | ||
|
||
```bash | ||
$ git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git | ||
``` | ||
|
||
Create a StorageClass yaml file `storageclass.yaml` | ||
|
||
```yaml | ||
kind: StorageClass | ||
apiVersion: storage.k8s.io/v1 | ||
metadata: | ||
name: fast-disks | ||
provisioner: kubernetes.io/no-provisioner | ||
volumeBindingMode: WaitForFirstConsumer | ||
``` | ||
```bash | ||
$ kubectl apply -f storageclass.yaml | ||
``` | ||
|
||
Create Local Persistent Volumes for Kubernetes | ||
|
||
- Change `hostDir` to the mount path | ||
|
||
```bash | ||
cd sig-storage-local-static-provisioner | ||
helm template ./helm/provisioner > ./provisioner/deployment/kubernetes/provisioner_generated.yaml | ||
|
||
kubectl apply -f ./deployment/kubernetes/provisioner_generated.yaml | ||
``` | ||
|
||
Output of the command line that shows the result of running the kubectl get pods command. | ||
|
||
```bash | ||
NAME READY STATUS RESTARTS AGE | ||
kserve-controller-manager-5c5c4d8c89-lrzbd 2/2 Running 0 4d2h | ||
local-nvme-pv-provisioner-vwxgt 1/1 Running 0 16m | ||
``` | ||
|
||
```bash | ||
$ kubectl get pv | ||
|
||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE | ||
local-pv-2a85b8ac 116Gi RWO Delete Bound kserve-test/model-cache fast-disks 4d3h | ||
``` | ||
## Create PVC and Mount the Model. | ||
|
||
```bash | ||
kubectl apply -f pvc-pod.yaml | ||
``` | ||
|
||
Refer: [Bloom Model Example](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Largemodels) | ||
|
||
Move the config.properties and MAR file to PVC in the below structure. | ||
|
||
```bash | ||
|_model-store | ||
|_bloom-560m.mar | ||
|_config | ||
|_config.properties | ||
``` | ||
|
||
```bash | ||
kubectl exec -it pv-pod -- mkdir /pv/config | ||
kubectl exec -it pv-pod -- mkdir /pv/model-store | ||
|
||
kubectl cp config.properties pv-pod:/pv/config | ||
kubectl cp bloom-560m.mar -it pv-pod:/pv/config | ||
``` | ||
|
||
## Create the InferenceService | ||
|
||
Apply the CRD | ||
|
||
```bash | ||
kubectl apply -f bloom-560m.yaml | ||
``` | ||
|
||
Expected Output | ||
|
||
```bash | ||
$inferenceservice.serving.kserve.io/torchserve-bloom-560m created | ||
``` | ||
|
||
## Run a prediction | ||
|
||
The first step is to [determine the ingress IP and ports](https://kserve.github.io/website/0.10/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT` | ||
|
||
```bash | ||
MODEL_NAME=BLOOMSeqClassification | ||
ISVC_NAME=torchserve-bloom-560m | ||
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${ISVC_NAME} -n <namespace> -o jsonpath='{.status.url}' | cut -d "/" -f 3) | ||
|
||
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d ./sample_text.txt | ||
``` | ||
|
||
Expected Output | ||
|
||
```bash | ||
* Trying 44.239.20.204... | ||
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (44.239.20.204) port 80 (#0) | ||
> PUT /v1/models/BLOOMSeqClassification:predict HTTP/1.1 | ||
> Host: torchserve-bloom-560m.kserve-test.example.com | ||
> User-Agent: curl/7.47.0 | ||
> Accept: */* | ||
> Content-Length: 79 | ||
> Expect: 100-continue | ||
> | ||
< HTTP/1.1 100 Continue | ||
* We are completely uploaded and fine | ||
< HTTP/1.1 200 OK | ||
< cache-control: no-cache; no-store, must-revalidate, private | ||
< content-length: 8 | ||
< date: Wed, 04 Nov 2020 10:54:49 GMT | ||
< expires: Thu, 01 Jan 1970 00:00:00 UTC | ||
< pragma: no-cache | ||
< x-request-id: 4b54d3ac-185f-444c-b344-b8a785fdeb50 | ||
< x-envoy-upstream-service-time: 2085 | ||
< server: istio-envoy | ||
< | ||
* Connection #0 to host torchserve-bloom-560m.kserve-test.example.com left intact | ||
Accepted | ||
``` | ||
**__Note__** For larger models use `A100 80g` GPU instances. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
apiVersion: serving.kserve.io/v1beta1 | ||
kind: InferenceService | ||
metadata: | ||
name: "torchserve-bloom-560m" | ||
spec: | ||
predictor: | ||
pytorch: | ||
storageUri: pvc://model-cache | ||
resources: | ||
limits: | ||
cpu: 4 | ||
memory: 16Gi | ||
nvidia.com/gpu: 1 | ||
nodeName: ip-xxx-xxx-xxx-xxx.us-west-2.compute.internal |
13 changes: 13 additions & 0 deletions
13
docs/samples/v1beta1/torchserve/v1/bloom/config.properties
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
inference_address=http://0.0.0.0:8085 | ||
management_address=http://0.0.0.0:8085 | ||
metrics_address=http://0.0.0.0:8082 | ||
grpc_inference_port=7070 | ||
grpc_management_port=7071 | ||
enable_metrics_api=true | ||
metrics_format=prometheus | ||
number_of_netty_threads=4 | ||
job_queue_size=10 | ||
enable_envvars_config=true | ||
install_py_dep_per_model=true | ||
model_store=/mnt/models/model-store | ||
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"bloom":{"1.0":{"defaultVersion":true,"marName":"bloom-560m.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
|
||
kind: PersistentVolumeClaim | ||
apiVersion: v1 | ||
metadata: | ||
name: model-local-claim | ||
spec: | ||
accessModes: | ||
- ReadWriteOnce | ||
resources: | ||
requests: | ||
storage: 700Gi | ||
storageClassName: fast-disks | ||
--- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: pv-pod | ||
spec: | ||
volumes: | ||
- name: pv-storage | ||
persistentVolumeClaim: | ||
claimName: model-local-claim | ||
containers: | ||
- name: pv-container | ||
image: alpine | ||
volumeMounts: | ||
- mountPath: "/pv" | ||
name: pv-storage |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"instances": [ | ||
{ | ||
"data": "My dog is cute" | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: StorageClass | ||
apiVersion: storage.k8s.io/v1 | ||
metadata: | ||
name: fast-disks | ||
provisioner: kubernetes.io/no-provisioner | ||
volumeBindingMode: WaitForFirstConsumer |