Skip to content

Add a guide for how to keep costs low for an AWS cluster #1425

Closed
@deliahu

Description

@deliahu

Description

Things to mention:

  • Mention multi-model endpoints, and link to example(s)
  • CPUs are cheaper than GPUs
  • User should consider spot instances, include a sample config like this (also mention to use similar types in instance_distribution, and link to spot docs):
# cluster.yaml

cluster_name: cortex
region: us-west-2
instance_type: g4dn.xlarge
min_instances: 0
max_instances: 20
spot: true
spot_config:
  on_demand_base_capacity: 0
  on_demand_percentage_above_base_capacity: 0
  on_demand_backup: true

Here is some sample text:

APIs will be able to scale down to 1 replica per API, but not 0. So if you have 9 APIs running, there will be a minimum of 9 replicas. Terminating instances from the AWS console will not help, since cortex will consider this as an unexpected state, and will re-create the instance. You can delete APIs to reduce the number of instances (cortex delete <api_name>), or you can serve multiple models from a single API, as is done in the pytorch/multi-model-text-analyzer example (this way you would have one endpoint, and would choose which one of the 9 APIs to run for the request based on a query parameter in the request URL).

Metadata

Metadata

Assignees

Labels

docsImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions