Skip to content

Commit 9f7df31

Browse files
committed
Add multi-model endpoints guide (#1081)
(cherry picked from commit ba465bd)
1 parent 7d6d714 commit 9f7df31

File tree

3 files changed

+111
-0
lines changed

3 files changed

+111
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
* **Autoscaling:** automatically scale APIs to handle production workloads.
1414
* **ML instances:** run inference on G4, P2, M5, C5 and other AWS instance types.
1515
* **Spot instances:** save money with spot instances.
16+
* **Multi-model endpoints:** deploy multiple models in a single API.
1617
* **Rolling updates:** update deployed APIs with no downtime.
1718
* **Log streaming:** stream logs from deployed models to your CLI.
1819
* **Prediction monitoring:** monitor API performance and prediction results.

docs/guides/multi-model.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Multi-model endpoints
2+
3+
It is possible to serve multiple models in the same Cortex API when using the Python predictor type (support for the TensorFlow predictor type is [coming soon](https://github.com/cortexlabs/cortex/issues/890)). In this guide, we'll deploy a sentiment analyzer and a text summarizer in one API, and we'll use query parameters to select the model. We'll be sharing a single GPU across both models.
4+
5+
## Step 1: implement your API
6+
7+
Create a new folder called "multi-model", and add these files:
8+
9+
### `cortex.yaml`
10+
11+
```yaml
12+
- name: text-analyzer
13+
predictor:
14+
type: python
15+
path: predictor.py
16+
compute:
17+
cpu: 1
18+
gpu: 1
19+
mem: 12G
20+
autoscaling:
21+
threads_per_worker: 1
22+
```
23+
24+
_Note: `threads_per_worker: 1` is the default, but setting it higher in production may increase throughput, especially when inference on one model takes a lot longer than the other(s)._
25+
26+
### `requirements.txt`
27+
28+
```python
29+
torch
30+
transformers==2.9.*
31+
```
32+
33+
### `predictor.py`
34+
35+
```python
36+
import torch
37+
from transformers import pipeline
38+
from starlette.responses import JSONResponse
39+
40+
41+
class PythonPredictor:
42+
def __init__(self, config):
43+
device = 0 if torch.cuda.is_available() else -1
44+
print(f"using device: {'cuda' if device == 0 else 'cpu'}")
45+
46+
self.analyzer = pipeline(task="sentiment-analysis", device=device)
47+
self.summarizer = pipeline(task="summarization", device=device)
48+
49+
def predict(self, query_params, payload):
50+
model_name = query_params.get("model")
51+
52+
if model_name == "sentiment":
53+
return self.analyzer(payload["text"])[0]
54+
elif model_name == "summarizer":
55+
summary = self.summarizer(payload["text"])
56+
return summary[0]["summary_text"]
57+
else:
58+
return JSONResponse({"error": f"unknown model: {model_name}"}, status_code=400)
59+
```
60+
61+
### sample-sentiment.json
62+
63+
```json
64+
{
65+
"text": "best day ever"
66+
}
67+
```
68+
69+
### sample-summarizer.json
70+
71+
```json
72+
{
73+
"text": "Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics."
74+
}
75+
```
76+
77+
## Step 2: deploy your API
78+
79+
```bash
80+
$ cd multi-model
81+
82+
$ cortex deploy
83+
```
84+
85+
Wait for your API to be ready (you can track its progress with `cortex get --watch`).
86+
87+
## Step 3: make prediction requests
88+
89+
Run `cortex get text-analyzer` to get your API endpoint, and save it as a bash variable for convenience (yours will be different from mine):
90+
91+
```bash
92+
$ api_endpoint=http://a36473270de8b46e79a769850dd3372d-c67035afa37ef878.elb.us-west-2.amazonaws.com/text-analyzer
93+
```
94+
95+
Make a request to the sentiment analysis model:
96+
97+
```bash
98+
$ curl ${api_endpoint}?model=sentiment -X POST -H "Content-Type: application/json" -d @sample-sentiment.json
99+
100+
{"label": "POSITIVE", "score": 0.9998506903648376}
101+
```
102+
103+
Make a request to the text summarizer model:
104+
105+
```bash
106+
$ curl ${api_endpoint}?model=summarizer -X POST -H "Content-Type: application/json" -d @sample-summarizer.json
107+
108+
Machine learning is the study of algorithms and statistical models that computer systems use to perform a specific task. It is seen as a subset of artificial intelligence. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision. In its application across business problems, machine learning is also referred to as predictive analytics.
109+
```

docs/summary.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747

4848
## Guides
4949

50+
* [Multi-model endpoints](guides/multi-model.md)
5051
* [View API metrics](guides/metrics.md)
5152
* [Set up AWS API gateway](guides/api-gateway.md)
5253
* [Set up HTTPS on a subdomain](guides/subdomain-https-setup.md)

0 commit comments

Comments
 (0)