Add a guide for how to keep costs low for an AWS cluster

#### Description

Things to mention:

* Mention multi-model endpoints, and link to example(s)
* CPUs are cheaper than GPUs
* User should consider spot instances, include a sample config like this (also mention to use similar types in `instance_distribution`, and link to spot docs):

```yaml
# cluster.yaml

cluster_name: cortex
region: us-west-2
instance_type: g4dn.xlarge
min_instances: 0
max_instances: 20
spot: true
spot_config:
  on_demand_base_capacity: 0
  on_demand_percentage_above_base_capacity: 0
  on_demand_backup: true
```

Here is some sample text:

APIs will be able to scale down to 1 replica per API, but not 0. So if you have 9 APIs running, there will be a minimum of 9 replicas. Terminating instances from the AWS console will not help, since cortex will consider this as an unexpected state, and will re-create the instance. You can delete APIs to reduce the number of instances (cortex delete <api_name>), or you can serve multiple models from a single API, as is done in the [pytorch/multi-model-text-analyzer](https://github.com/cortexlabs/cortex/tree/0.20/examples/pytorch/multi-model-text-analyzer) example (this way you would have one endpoint, and would choose which one of the 9 APIs to run for the request based on a query parameter in the request URL).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a guide for how to keep costs low for an AWS cluster #1425

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a guide for how to keep costs low for an AWS cluster #1425

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions