Skip to content

Commit da6324b

Browse files
authored
Update docs (#861)
1 parent e525410 commit da6324b

File tree

22 files changed

+780
-674
lines changed

22 files changed

+780
-674
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,8 @@ positive
115115
```bash
116116
$ cortex get sentiment-classifier --watch
117117

118-
status up-to-date requested last update avg inference 2XX
119-
live 1 1 8s 24ms 12
118+
status up-to-date requested last update avg request 2XX
119+
live 1 1 8s 24ms 12
120120

121121
class count
122122
positive 8

docs/dependency-management/python-packages.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

docs/deployments/api-configuration.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# API configuration
2+
3+
_WARNING: you are on the master branch, please refer to the docs on the branch that matches your `cortex version`_
4+
5+
Once your model is [exported](exporting.md) and you've implemented a [Predictor](predictors.md), you can configure your API via a yaml file (typically named `cortex.yaml`).
6+
7+
Reference the section below which corresponds to your Predictor type: [Python](#python-predictor), [TensorFlow](#tensorflow-predictor), or [ONNX](#onnx-predictor).
8+
9+
## Python Predictor
10+
11+
```yaml
12+
- name: <string> # API name (required)
13+
endpoint: <string> # the endpoint for the API (default: <api_name>)
14+
predictor:
15+
type: python
16+
path: <string> # path to a python file with a PythonPredictor class definition, relative to the Cortex root (required)
17+
config: <string: value> # arbitrary dictionary passed to the constructor of the Predictor (optional)
18+
python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)
19+
env: <string: string> # dictionary of environment variables
20+
tracker:
21+
key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)
22+
model_type: <string> # model type, must be "classification" or "regression" (required)
23+
compute:
24+
cpu: <string | int | float> # CPU request per replica (default: 200m)
25+
gpu: <int> # GPU request per replica (default: 0)
26+
mem: <string> # memory request per replica (default: Null)
27+
autoscaling:
28+
min_replicas: <int> # minimum number of replicas (default: 1)
29+
max_replicas: <int> # maximum number of replicas (default: 100)
30+
init_replicas: <int> # initial number of replicas (default: <min_replicas>)
31+
workers_per_replica: <int> # the number of parallel serving workers to run on each replica (default: 1)
32+
threads_per_worker: <int> # the number of threads per worker (default: 1)
33+
target_replica_concurrency: <float> # the desired number of in-flight requests per replica, which the autoscaler tries to maintain (default: workers_per_replica * threads_per_worker)
34+
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
35+
window: <duration> # the time over which to average the API's concurrency (default: 60s)
36+
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
37+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
38+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
39+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
40+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
41+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
42+
update_strategy:
43+
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
44+
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
45+
```
46+
47+
See additional documentation for [autoscaling](autoscaling.md), [compute](compute.md), and [prediction monitoring](prediction-monitoring.md).
48+
49+
## TensorFlow Predictor
50+
51+
```yaml
52+
- name: <string> # API name (required)
53+
endpoint: <string> # the endpoint for the API (default: <api_name>)
54+
predictor:
55+
type: tensorflow
56+
path: <string> # path to a python file with a TensorFlowPredictor class definition, relative to the Cortex root (required)
57+
model: <string> # S3 path to an exported model (e.g. s3://my-bucket/exported_model) (required)
58+
signature_key: <string> # name of the signature def to use for prediction (required if your model has more than one signature def)
59+
config: <string: value> # arbitrary dictionary passed to the constructor of the Predictor (optional)
60+
python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)
61+
env: <string: string> # dictionary of environment variables
62+
tracker:
63+
key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)
64+
model_type: <string> # model type, must be "classification" or "regression" (required)
65+
compute:
66+
cpu: <string | int | float> # CPU request per replica (default: 200m)
67+
gpu: <int> # GPU request per replica (default: 0)
68+
mem: <string> # memory request per replica (default: Null)
69+
autoscaling:
70+
min_replicas: <int> # minimum number of replicas (default: 1)
71+
max_replicas: <int> # maximum number of replicas (default: 100)
72+
init_replicas: <int> # initial number of replicas (default: <min_replicas>)
73+
workers_per_replica: <int> # the number of parallel serving workers to run on each replica (default: 1)
74+
threads_per_worker: <int> # the number of threads per worker (default: 1)
75+
target_replica_concurrency: <float> # the desired number of in-flight requests per replica, which the autoscaler tries to maintain (default: workers_per_replica * threads_per_worker)
76+
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
77+
window: <duration> # the time over which to average the API's concurrency (default: 60s)
78+
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
79+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
80+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
81+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
82+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
83+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
84+
update_strategy:
85+
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
86+
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
87+
```
88+
89+
See additional documentation for [autoscaling](autoscaling.md), [compute](compute.md), and [prediction monitoring](prediction-monitoring.md).
90+
91+
## ONNX Predictor
92+
93+
```yaml
94+
- name: <string> # API name (required)
95+
endpoint: <string> # the endpoint for the API (default: <api_name>)
96+
predictor:
97+
type: onnx
98+
path: <string> # path to a python file with an ONNXPredictor class definition, relative to the Cortex root (required)
99+
model: <string> # S3 path to an exported model (e.g. s3://my-bucket/exported_model.onnx) (required)
100+
config: <string: value> # arbitrary dictionary passed to the constructor of the Predictor (optional)
101+
python_path: <string> # path to the root of your Python folder that will be appended to PYTHONPATH (default: folder containing cortex.yaml)
102+
env: <string: string> # dictionary of environment variables
103+
tracker:
104+
key: <string> # the JSON key in the response to track (required if the response payload is a JSON object)
105+
model_type: <string> # model type, must be "classification" or "regression" (required)
106+
compute:
107+
cpu: <string | int | float> # CPU request per replica (default: 200m)
108+
gpu: <int> # GPU request per replica (default: 0)
109+
mem: <string> # memory request per replica (default: Null)
110+
autoscaling:
111+
min_replicas: <int> # minimum number of replicas (default: 1)
112+
max_replicas: <int> # maximum number of replicas (default: 100)
113+
init_replicas: <int> # initial number of replicas (default: <min_replicas>)
114+
workers_per_replica: <int> # the number of parallel serving workers to run on each replica (default: 1)
115+
threads_per_worker: <int> # the number of threads per worker (default: 1)
116+
target_replica_concurrency: <float> # the desired number of in-flight requests per replica, which the autoscaler tries to maintain (default: workers_per_replica * threads_per_worker)
117+
max_replica_concurrency: <int> # the maximum number of in-flight requests per replica before requests are rejected with error code 503 (default: 1024)
118+
window: <duration> # the time over which to average the API's concurrency (default: 60s)
119+
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
120+
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 0m)
121+
max_downscale_factor: <float> # the maximum factor by which to scale down the API on a single scaling event (default: 0.5)
122+
max_upscale_factor: <float> # the maximum factor by which to scale up the API on a single scaling event (default: 10)
123+
downscale_tolerance: <float> # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.1)
124+
upscale_tolerance: <float> # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.1)
125+
update_strategy:
126+
max_surge: <string | int> # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
127+
max_unavailable: <string | int> # maximum number of replicas that can be unavailable during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%)
128+
```
129+
130+
See additional documentation for [autoscaling](autoscaling.md), [compute](compute.md), and [prediction monitoring](prediction-monitoring.md).

docs/deployments/compute.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ One unit of memory is one byte. Memory can be expressed as an integer or by usin
2727

2828
## GPU
2929

30-
1. Make sure your AWS account is subscribed to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM).
31-
2. You may need to [file an AWS support ticket](https://console.aws.amazon.com/support/cases#/create?issueType=service-limit-increase&limitType=ec2-instances) to increase the limit for your desired instance type.
32-
3. Set instance type to an AWS GPU instance (e.g. p2.xlarge) when installing Cortex.
33-
4. Note that one unit of GPU corresponds to one virtual GPU on AWS. Fractional requests are not allowed.
30+
One unit of GPU corresponds to one virtual GPU. Fractional requests are not allowed.
31+
32+
See [GPU documentation](gpus.md) for more information.

docs/deployments/deployment.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# API deployment
2+
3+
_WARNING: you are on the master branch, please refer to the docs on the branch that matches your `cortex version`_
4+
5+
Once your model is [exported](exporting.md), you've implemented a [Predictor](predictors.md), and you've [configured your API](api-configuration.md), you're ready to deploy!
6+
7+
## `cortex deploy`
8+
9+
The `cortex deploy` command collects your configuration and source code and deploys your API on your cluster:
10+
11+
```bash
12+
$ cortex deploy
13+
14+
creating my-api
15+
```
16+
17+
APIs are declarative, so to update your API, simply modify your source code and/or configuration and run `cortex deploy` again.
18+
19+
## `cortex get`
20+
21+
The `cortex get` command displays the status of your APIs, and `cortex get <api_name>` shows additional information about a specific API.
22+
23+
```bash
24+
$ cortex get my-api
25+
26+
status up-to-date requested last update avg request 2XX
27+
live 1 1 1m - -
28+
29+
endpoint: http://***.amazonaws.com/iris-classifier
30+
...
31+
```
32+
33+
Appending the `--watch` flag will re-run the `cortex get` command every second.
34+
35+
## `cortex logs`
36+
37+
You can stream logs from your API using the `cortex logs` command:
38+
39+
```bash
40+
$ cortex logs my-api
41+
```
42+
43+
## Making a prediction
44+
45+
You can use `curl` to test your prediction service, for example:
46+
47+
```bash
48+
$ curl http://***.amazonaws.com/my-api \
49+
-X POST -H "Content-Type: application/json" \
50+
-d '{"key": "value"}'
51+
```
52+
53+
## Debugging
54+
55+
You can log information about each request by adding the `?debug=true` parameter to your requests. This will print the payload and the value after running your `predict()` function in the API logs.
56+
57+
## `cortex delete`
58+
59+
You can delete your API with the `cortex delete` command:
60+
61+
```bash
62+
$ cortex delete my-api
63+
64+
deleting my-api
65+
```
66+
67+
## Additional resources
68+
69+
<!-- CORTEX_VERSION_MINOR -->
70+
* [Tutorial](../../examples/sklearn/iris-classifier/README.md) provides a step-by-step walkthough of deploying an iris classifier API
71+
* [CLI documentation](../cluster-management/cli.md) lists all CLI commands
72+
* [Examples](https://github.com/cortexlabs/cortex/tree/master/examples) demonstrate how to deploy models from common ML libraries

0 commit comments

Comments
 (0)