Skip to content

Commit

Permalink
Feature: Add model archive and config generator for torchserve (kubef…
Browse files Browse the repository at this point in the history
…low#1186)

* Feature: model archive and config generator for torchserve

* Fix: Update readme docs for model-archiver
  • Loading branch information
jagadeeshi2i authored Dec 3, 2020
1 parent 95da18a commit a5a434b
Show file tree
Hide file tree
Showing 14 changed files with 469 additions and 0 deletions.
157 changes: 157 additions & 0 deletions docs/samples/v1beta1/torchserve/model-archiver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Generate model archiver files for torchserve

## Setup

1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).

## 1. Create PV and PVC

Create a Persistent volume and volume claim. This document uses amazonEBS PV

### 1.1 Create PV

Edit volume id in pv.yaml file

```bash
kubectl apply -f pv.yaml
```

Expected Output

```bash
persistentvolume/model-pv-volume created
```

### 1.2 Create PVC

```bash
kubectl apply -f pvc.yaml
```

Expected Output

```bash
persistentvolumeclaim/model-pv-claim created
```

## 2 Create model store files layout and copy to PV

We create a pod with the PV attached to copy the model files and config.properties for generating model archive file.

### 2.1 Create pod for copying model store files to PV

```bash
kubectl apply -f pvpod.yaml
```

Expected Output

```bash
pod/model-store-pod created
```

### 2.2 Create model store file layout on PV

#### 2.2.1 Create properties.json file

This file has model-name, version, model-file name, serialized-file name, extra-files, handlers, workers etc. of the models.

```json
[
{
"model-name": "mnist",
"version": "1.0",
"model-file": "",
"serialized-file": "mnist_cnn.pt",
"extra-files": "",
"handler": "mnist_handler.py",
"min-workers" : 1,
"max-workers": 3,
"batch-size": 1,
"max-batch-delay": 100,
"response-timeout": 120,
"requirements": ""
},
{
"model-name": "densenet_161",
"version": "1.0",
"model-file": "",
"serialized-file": "densenet161-8d451a50.pth",
"extra-files": "index_to_name.json",
"handler": "image_classifier",
"min-workers" : 1,
"max-workers": 3,
"batch-size": 1,
"max-batch-delay": 100,
"response-timeout": 120,
"requirements": ""
}
]
```

#### 2.2.2 Copy model and its dependent Files

Copy all the model and dependent files to the PV in the structure given below.
An empty config folder, a model-store folder containing model name as folder name. Within that model folder, the files required to build the marfile.

```bash
├── config
├── model-store
│   ├── densenet_161
│   │   ├── densenet161-8d451a50.pth
│   │   ├── index_to_name.json
│   │   └── model.py
│   ├── mnist
│   │   ├── mnist_cnn.pt
│   │   ├── mnist_handler.py
│   │   └── mnist.py
│   └── properties.json

```

#### 2.2.3 Create folders for model-store and config in PV

```bash
kubectl exec -it model-store-pod -c model-store -n kfserving-test -- mkdir /pv/model-store/

kubectl exec -it model-store-pod -c model-store -n kfserving-test -- mkdir /pv/config/
```

### 2.3 Copy model files and config.properties to the PV

```bash
kubectl cp model-store/* model-store-pod:/pv/model-store/ -c model-store -n kfserving-test
kubectl cp config.properties model-store-pod:/pv/config/ -c model-store -n kfserving-test
```

### 2.4 Delete pv pod

Since amazon EBS provide only ReadWriteOnce mode, we have to unbind the PV for use of model archiver.

```bash
kubectl delete pod model-store-pod -n kfserving-test
```

## 3 Generate model archive file and server configuration file

### 3.1 Create model archive pod and run model archive file generation script

```bash
kubectl apply -f model-archiver.yaml -n kfserving-test
```

### 3.2 Check the output and delete model archive pod

Verify mar files and config.properties

```bash
kubectl exec -it margen-pod -n kfserving-test -- ls -lR /home/model-server/model-store
kubectl exec -it margen-pod -n kfserving-test -- cat /home/model-server/config/config.properties
```

### 3.3 Delete model archiver

```bash
kubectl delete -f model-archiver.yaml -n kfserving-test
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
FROM ubuntu:18.04

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
python3-dev \
python3-distutils \
python3-venv \
curl \
jq \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
&& curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3 get-pip.py

RUN python3 -m venv /home/venv

ENV PATH="/home/venv/bin:$PATH"

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

RUN pip install --no-cache-dir torch-model-archiver

RUN useradd -m model-server

# ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh

RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
&& chown -R model-server /home/model-server

USER model-server
WORKDIR /home/model-server
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Model archiver for torchserve

**Steps:**

1. Modify config in entrypoint for default config (optional)
2. Build docker image
3. Push docker image to repo

```bash
docker build --file Dockerfile -t margen:latest .

docker tag margen:latest {username}/margen:latest

docker push {username}/margen:latest
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/bin/bash
########)#############################################################
#### Script for model archiving and config.properties generation #####
######################################################################
set -e

BASE_PATH='/home/model-server'
MODEL_STORE=$BASE_PATH/model-store
CONFIG_PATH=$BASE_PATH/config

touch $CONFIG_PATH/config.properties

cat <<EOF > "$CONFIG_PATH"/config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
number_of_netty_threads=4
job_queue_size=100
model_store="$MODEL_STORE"
model_snapshot=
EOF
truncate -s -1 "$CONFIG_PATH"/config.properties

CONFIG_PROPERTIES=$CONFIG_PATH/config.properties
PROPERTIES_JSON=$MODEL_STORE/properties.json

count=$(jq -c '. | length' "$PROPERTIES_JSON")
echo "{\"name\":\"startup.cfg\",\"modelCount\":\"3\",\"models\":{}}" | jq -c --arg count "${count}" '.["modelCount"]=$count' >> $CONFIG_PROPERTIES
sed -i 's/{}}//' $CONFIG_PROPERTIES
truncate -s -1 $CONFIG_PROPERTIES
# shellcheck disable=SC1091
jq -c '.[]' "$PROPERTIES_JSON" | while read -r i; do
modelName=$(echo "$i" | jq -r '."model-name"')
modelFile=$(echo "$i" | jq -r '."model-file"')
version=$(echo "$i" | jq -r '."version"')
serializedFile=$(echo "$i" | jq -r '."serialized-file"')
extraFiles=$(echo "$i" | jq -r '."extra-files"')
handler=$(echo "$i" | jq -r '."handler"')
minWorkers=$(echo "$i" | jq -r '."min-workers"')
maxWorkers=$(echo "$i" | jq -r '."max-workers"')
batchSize=$(echo "$i" | jq -r '."batch-size"')
maxBatchDelay=$(echo "$i" | jq -r '."max-batch-delay"')
responseTimeout=$(echo "$i" | jq -r '."response-timeout"')
marName=${modelName}.mar
requirements=$(echo "$i" | jq -r '."requirements"')
updatedExtraFiles=$(echo "$extraFiles" | tr "," "\n" | awk -v modelName="$MODEL_STORE" -v modelFile="$modelName" '{ print modelName"/"modelFile"/"$1 }' | paste -sd "," -)
########)#############################
#### Support for custom handlers #####
######################################
pyfile="$( cut -d '.' -f 2 <<< "$handler" )"
if [ "$pyfile" == "py" ];
then
handler="$MODEL_STORE/$modelName/$handler"
fi
if [ -z "$modelFile" ]; then
if [ -z "$extraFiles" ]; then
torch-model-archiver --model-name "$modelName" --version "$version" --serialized-file "$MODEL_STORE/$modelName/$serializedFile" --export-path "$MODEL_STORE" --handler "$handler" -r "$requirements" --force
else
torch-model-archiver --model-name "$modelName" --version "$version" --serialized-file "$MODEL_STORE/$modelName/$serializedFile" --export-path "$MODEL_STORE" --extra-files "$updatedExtraFiles" --handler "$handler" -r "$requirements" --force
fi
else
if [ -z "$extraFiles" ]; then
torch-model-archiver --model-name "$modelName" --version "$version" --model-file "$MODEL_STORE/$modelName/$modelFile" --serialized-file "$MODEL_STORE/$modelName/$serializedFile" --export-path "$MODEL_STORE" --handler "$handler" -r "$requirements" --force
else
torch-model-archiver --model-name "$modelName" --version "$version" --model-file "$MODEL_STORE/$modelName/$modelFile" --serialized-file "$MODEL_STORE/$modelName/$serializedFile" --export-path "$MODEL_STORE" --extra-files "$updatedExtraFiles" --handler "$handler" -r "$requirements" --force
fi
fi
echo "{\"modelName\":{\"version\":{\"defaultVersion\":true,\"marName\":\"sample.mar\",\"minWorkers\":\"sampleminWorkers\",\"maxWorkers\":\"samplemaxWorkers\",\"batchSize\":\"samplebatchSize\",\"maxBatchDelay\":\"samplemaxBatchDelay\",\"responseTimeout\":\"sampleresponseTimeout\"}}}" |
jq -c --arg modelName "$modelName" --arg version "$version" --arg marName "$marName" --arg minWorkers "$minWorkers" --arg maxWorkers "$maxWorkers" --arg batchSize "$batchSize" --arg maxBatchDelay "$maxBatchDelay" --arg responseTimeout "$responseTimeout" '.[$modelName]=."modelName" | .[$modelName][$version]=.[$modelName]."version" | .[$modelName][$version]."marName"=$marName | .[$modelName][$version]."minWorkers"=$minWorkers | .[$modelName][$version]."maxWorkers"=$maxWorkers | .[$modelName][$version]."batchSize"=$batchSize | .[$modelName][$version]."maxBatchDelay"=$maxBatchDelay | .[$modelName][$version]."responseTimeout"=$responseTimeout | del(."modelName", .[$modelName]."version")' >> $CONFIG_PROPERTIES
truncate -s -1 $CONFIG_PROPERTIES
done
sed -i 's/}{/,/g' $CONFIG_PROPERTIES
sed -i 's/}}}/}}}}/g' $CONFIG_PROPERTIES

# prevent docker exit
tail -f /dev/null


26 changes: 26 additions & 0 deletions docs/samples/v1beta1/torchserve/model-archiver/model-archiver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: v1
kind: Pod
metadata:
name: margen-pod
spec:
volumes:
- name: model-pv-claim
persistentVolumeClaim:
claimName: model-pv-claim
initContainers:
- name: data-permission-fix
image: busybox
command: ["/bin/chmod","-R","777", "/data"]
volumeMounts:
- name: model-pv-claim
mountPath: /data
containers:
- name: margen-container
image: {username}/margen:v1.2
volumeMounts:
- mountPath: "/home/model-server"
name: model-pv-claim
resources:
limits:
memory: 400Mi
cpu: 300m
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Model archiver for torchserve

## Place all the file required to grenerate marfile in the model folder

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from torchvision.models.densenet import DenseNet


class ImageClassifier(DenseNet):
def __init__(self):
super(ImageClassifier, self).__init__(48, (6, 12, 36, 24), 96)

def load_state_dict(self, state_dict, strict=True):
# '.'s are no longer allowed in module names, but previous _DenseLayer
# has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.
# They are also in the checkpoints in model_urls. This pattern is used
# to find such keys.
# Credit - https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#def _load_state_dict()
import re
pattern = re.compile(r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$')

for key in list(state_dict.keys()):
res = pattern.match(key)
if res:
new_key = res.group(1) + res.group(2)
state_dict[new_key] = state_dict[key]
del state_dict[key]

return super(ImageClassifier, self).load_state_dict(state_dict, strict)
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import torch
from torch import nn
import torch.nn.functional as F


class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
Loading

0 comments on commit a5a434b

Please sign in to comment.