Skip to content

Commit

Permalink
PyTorch Support (kubeflow#153)
Browse files Browse the repository at this point in the history
* initial pytorch support

* modifying go crd files

* modifying go crd files

* test placeholder

* cleaning up test placeholder file

* updating with cifar10 sample

* updating cifar10 instructions

* correcting docker steps

* adding pytorch doeckerfile

* removing generated data files

* correcting the sample pytorch yaml file

* adding the class file and class name parameters

* adding cifar10 input file and dockerfile

* adding gcs location for model file

* addressing review comments

* simplifying PyTorch interface

* making model class name optional

* fix the comment ordering

* removing model file

* adding default behaviour
  • Loading branch information
animeshsingh authored and k8s-ci-robot committed Jun 27, 2019
1 parent f5a749e commit 61a35a7
Show file tree
Hide file tree
Showing 16 changed files with 3,977 additions and 0 deletions.
30 changes: 30 additions & 0 deletions config/default/crds/serving_v1alpha1_kfservice.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,21 @@ spec:
0 in case of no traffic
format: int64
type: integer
pytorch:
properties:
modelClassName:
description: Name of the model class for PyTorch model
type: string
modelUri:
type: string
resources:
description: Defaults to requests and limits of 1CPU, 2Gb MEM.
type: object
runtimeVersion:
type: string
required:
- modelUri
type: object
serviceAccountName:
description: Service Account Name
type: string
Expand Down Expand Up @@ -144,6 +159,21 @@ spec:
0 in case of no traffic
format: int64
type: integer
pytorch:
properties:
modelClassName:
description: Defaults to latest PyTorch Version.
type: string
modelUri:
type: string
resources:
description: Defaults to requests and limits of 1CPU, 2Gb MEM.
type: object
runtimeVersion:
type: string
required:
- modelUri
type: object
serviceAccountName:
description: Service Account Name
type: string
Expand Down
112 changes: 112 additions & 0 deletions docs/samples/pytorch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
## Creating your own model and testing the PyTorch server.

To test the [PyTorch](https://pytorch.org/) server, first we need to generate a simple cifar10 model using PyTorch.

```shell
python cifar10.py
```
You should see an output similar to this

```shell
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Failed download. Trying https -> http instead. Downloading http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
100.0%Files already downloaded and verified
[1, 2000] loss: 2.232
[1, 4000] loss: 1.913
[1, 6000] loss: 1.675
[1, 8000] loss: 1.555
[1, 10000] loss: 1.492
[1, 12000] loss: 1.488
[2, 2000] loss: 1.412
[2, 4000] loss: 1.358
[2, 6000] loss: 1.362
[2, 8000] loss: 1.338
[2, 10000] loss: 1.315
[2, 12000] loss: 1.278
Finished Training
```

Then, we can run the PyTorch server using the trained model and test for predictions. Models can be on local filesystem, S3 compatible object storage or Google Cloud Storage.

Note: Currently KFServing supports PyTorch models saved using [state_dict method]((https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference), PyTorch's recommended way of saving models for inference. The KFServing interface for PyTorch expects users to upload the model_class_file in same location as the PyTorch model, and accepts an optional model_class_name to be passed in as a runtime input. If model class name is not specified, we use 'PyTorchModel' as the default class name. The current interface may undergo changes as we evolve this to support PyTorch models saved using other methods as well.

```shell
python -m pytorchserver --model_dir ./ --model_name pytorchmodel --model_class_name Net
```

We can also use the inbuilt PyTorch support for sample datasets and do some simple predictions

```python
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
dataiter = iter(testloader)
images, labels = dataiter.next()
formData = {
'instances': images[0:1].tolist()
}
res = requests.post('http://localhost:8080/models/pytorchmodel:predict', json=formData)
print(res)
print(res.text)
```

# Predict on a KFService using PyTorch

## Setup
1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible.
3. Your cluster's Istio Egresss gateway must [allow Google Cloud Storage](https://knative.dev/docs/serving/outbound-network-access/)

## Create the KFService

Apply the CRD
```
kubectl apply -f pytorch.yaml
```

Expected Output
```
$ kfservice.serving.kubeflow.org/pytorch-cifar10 created
```

## Run a prediction

```
MODEL_NAME=pytorch-cifar10
INPUT_PATH=@./input.json
CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get kfservice pytorch-cifar10 -o jsonpath='{.status.url}')
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://$CLUSTER_IP/models/$MODEL_NAME:predict
```

You should see an output similar to the one below:

```
> POST /models/pytorch-cifar10:predict HTTP/1.1
> Host: pytorch-cifar10.default.svc.cluster.local
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 110681
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-length: 221
< content-type: application/json; charset=UTF-8
< date: Fri, 21 Jun 2019 04:05:39 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 35292
<
{"predictions": [[-0.8955065011978149, -1.4453213214874268, 0.1515328735113144, 2.638284683227539, -1.00240159034729, 2.270702600479126, 0.22645258903503418, -0.880557119846344, 0.08783778548240662, -1.5551214218139648]]
```
79 changes: 79 additions & 0 deletions docs/samples/pytorch/cifar10.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x


if __name__ == "__main__":

transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data

# zero the parameter gradients
optimizer.zero_grad()

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')

# Save model
torch.save(net.state_dict(), "model.pt")
Loading

0 comments on commit 61a35a7

Please sign in to comment.