cortexlabs · vishalbollu · Jul 14, 2021 · Jul 12, 2021 · Jul 13, 2021 · Jul 13, 2021
diff --git a/docs/clients/install.md b/docs/clients/install.md
@@ -1,6 +1,16 @@
 # Install
 
-## Install with pip
+## Install the CLI
+
+<!-- CORTEX_VERSION_README x2 -->
+```bash
+# download CLI version 0.38.0 (Note the "v"):
+bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)"
+```
+
+By default, the Cortex CLI is installed at `/usr/local/bin/cortex`. To install the executable elsewhere, export the `CORTEX_INSTALL_PATH` environment variable to your desired location before running the command above.
+
+## Install the CLI and Python client via pip
 
 To install the latest version:
 
@@ -21,16 +31,6 @@ To upgrade to the latest version:
 pip install --upgrade cortex
 ```
 
-## Install without the Python client
-
-<!-- CORTEX_VERSION_README x2 -->
-```bash
-# For example to download CLI version 0.38.0 (Note the "v"):
-bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)"
-```
-
-By default, the Cortex CLI is installed at `/usr/local/bin/cortex`. To install the executable elsewhere, export the `CORTEX_INSTALL_PATH` environment variable to your desired location before running the command above.
-
 ## Changing the CLI/client configuration directory
 
 By default, the CLI/client creates a directory at `~/.cortex/` and uses it to store environment configuration. To use a different directory, export the `CORTEX_CLI_CONFIG_DIR` environment variable before running any `cortex` commands.
diff --git a/docs/clusters/instances/spot.md b/docs/clusters/instances/spot.md
@@ -17,7 +17,7 @@ node_groups:
       on_demand_base_capacity: 0
 
       # percentage of on demand instances to use after the on demand base capacity has been met [0, 100] (default: 50)
-      # note: setting this to 0 may hinder cluster scale up when spot instances are not available
+      # note: setting this to 0 may hinder cluster scale-up when spot instances are not available
       on_demand_percentage_above_base_capacity: 0
 
       # max price for spot instances (default: the on-demand price of the primary instance type)

diff --git a/docs/clusters/management/create.md b/docs/clusters/management/create.md
@@ -9,9 +9,10 @@
 
 ## Create a cluster on your AWS account
 
+<!-- CORTEX_VERSION_README -->
 ```bash
-# install the CLI
-pip install cortex
+# install the cortex CLI
+bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)"
 
 # create a cluster
 cortex cluster up cluster.yaml

diff --git a/docs/clusters/management/delete.md b/docs/clusters/management/delete.md
@@ -8,10 +8,13 @@ cortex cluster down
 
 When a Cortex cluster is created, an S3 bucket is created for its internal use. When running `cortex cluster down`, a lifecycle rule is applied to the bucket such that its entire contents are removed within the next 24 hours. You can safely delete the bucket at any time after `cortex cluster down` has finished running.
 
-## Delete Certificates
+## Delete SSL Certificate
 
-If you've configured a custom domain for your APIs, you can remove the SSL Certificate and Hosted Zone for the domain by
-following these [instructions](../networking/custom-domain.md#cleanup).
+If you've set up HTTPS, you can remove the SSL Certificate by following these [instructions](../networking/https.md#cleanup).
+
+## Delete Hosted Zone
+
+If you've configured a custom domain for your APIs, follow these [instructions](../networking/custom-domain.md#cleanup) to delete the Hosted Zone.
 
 ## Keep Cortex Resources
 

diff --git a/docs/clusters/management/production.md b/docs/clusters/management/production.md
@@ -0,0 +1,89 @@
+# Production guide
+
+As you take Cortex from development to production, here are a few pointers that might be useful.
+
+## Use images from a colocated ECR
+
+Configure your cluster and APIs to use images from ECR in the same region as your cluster to accelerate scale-ups, reduce ingress costs, and remove the dependency on Cortex's public quay.io registry.
+
+You can find instructions for mirroring Cortex images [here](../advanced/self-hosted-images.md)
+
+## Handling Cortex updates/upgrades
+
+Use a Route 53 hosted zone as a proxy in front of your Cortex cluster. Every new Cortex cluster provisions a new API load balancer with a unique endpoint. Using a Route 53 hosted zone configured with a subdomain will expose your Cortex cluster API endpoint as a static endpoint (e.g. `cortex.your-company.com`). You will be able to upgrade Cortex versions without downtime, and you will avoid the need to updated your client code every time you migrate to a new cluster. You can find instructions for setting up a custom domain with a Route 53 hosted zone [here](../networking/custom-domain.md), and instructions for updating/upgrading your cluster [here](update.md).
+
+## Production cluster configuration
+
+### Securing your cluster
+
+The following configuration will improve security by preventing your cluster's nodes from being publicly accessible.
+
+```yaml
+subnet_visibility: private
+
+nat_gateway: single  # use "highly_available" for large clusters making requests to services outside of the cluster
+```
+
+You can make your load balancer private to prevent your APIs from being publicly accessed. In order to access your APIs, you will need to set up VPC peering between the Cortex cluster's VPC and the VPC containing the consumers of the Cortex APIs. See the [VPC peering guide](../networking/vpc-peering.md) for more details.
+
+```yaml
+api_load_balancer_scheme: internal
+```
+
+You can also restrict access to your load balancers by IP address:
+
+```yaml
+api_load_balancer_cidr_white_list: [0.0.0.0/0]
+```
+
+These two fields are also available for the operator load balancer. Keep in mind that if you make the operator load balancer private, you'll need to configure VPC peering to use the `cortex` CLI or Python client.
+
+```yaml
+operator_load_balancer_scheme: internal
+operator_load_balancer_cidr_white_list: [0.0.0.0/0]
+```
+
+See [here](../networking/load-balancers.md) for more information about the load balancers.
+
+### Ensure node provisioning
+
+You can take advantage of the cost savings of spot instances and the reliability of on-demand instances by utilizing the `priority` field in node groups. You can deploy two node groups, one that is spot and another that is on-demand. Set the priority of the spot node group to be higher than the priority of the on-demand node group. This encourages the cluster-autoscaler to try to spin up instances from the spot node group first. If there are no more spot instances available, the on-demand node group will be used instead.
+
+```yaml
+node_groups:
+  - name: gpu-spot
+    instance_type: g4dn.xlarge
+    min_instances: 0
+    max_instances: 5
+    spot: true
+    priority: 100
+  - name: gpu-on-demand
+    instance_type: g4dn.xlarge
+    min_instances: 0
+    max_instances: 5
+    priority: 1
+```
+
+### Considerations for large clusters
+
+If you plan on scaling your Cortex cluster past 400 nodes or 800 pods, it is recommended to set `prometheus_instance_type` to a larger instance type. A good guideline is that a t3.medium instance can reliably handle 400 nodes and 800 pods.
+
+## API Spec
+
+### Container design
+
+Configure your health checks to be as accurate as possible to prevent requests from being routed to pods that aren't ready to handle traffic.
+
+### Pods section
+
+Make sure that `max_concurrency` is set to match the concurrency supported by your container.
+
+Tune `max_queue_length` to lower values if you would like to more aggressively redistribute requests to newer pods as your API scales up rather than allowing requests to linger in queues. This would mean that the clients consuming your APIs should implement retry logic with a delay (such as exponential backoff).
+
+### Compute section
+
+Make sure to specify all of the relevant compute resources (especially cpu and memory) to ensure that your pods aren't starved for resources.
+
+### Autoscaling
+
+Revisit the autoscaling docs for [Realtime APIs](../../workloads/realtime/autoscaling.md) and/or [Async APIs](../../workloads/async/autoscaling.md) to effectively handle production traffic by tuning the scaling rate, sensitivity, and over-provisioning.
diff --git a/docs/clusters/management/update.md b/docs/clusters/management/update.md
@@ -1,36 +1,114 @@
 # Update
 
-## Update node group size
+## Modify existing cluster
+
+You can add or remove node groups, resize existing node groups, and update some configuration fields of a running cluster.
+
+Fetch the current cluster configuration:
 
 ```bash
-cortex cluster scale --node-group <node-group-name> --min-instances <min-instances> --max-instances <max-instances>
+cortex cluster info --print-config --name CLUSTER_NAME --region REGION > cluster.yaml
 ```
 
-## Upgrade to a newer version
+Make your desired changes, and then apply them:
 
 ```bash
-# spin down your cluster
-cortex cluster down --name <name> --region <region>
+cortex cluster configure cluster.yaml
+```
+
+Cortex will calculate the difference and you will be prompted with the update plan.
+
+If you would like to update fields that cannot be modified on a running cluster, you must create a new cluster with your desired configuration.
+
+## Upgrade to a new version
+
+Updating an existing Cortex cluster is not supported at the moment. Please spin down the previous version of the cluster, install the latest version of the Cortex CLI, and use it to spin up a new Cortex cluster. See the next section for how to do this without downtime.
+
+## Update or upgrade without downtime
+
+It is possible to update to a new version Cortex or to migrate from one cluster to another without downtime.
+
+Note: it is important to not spin down your previous cluster until after your new cluster is receiving traffic.
+
+### Set up a subdomain using a Route 53 hosted zone
+
+If you've already set up a subdomain with a Route 53 hosted zone pointing to your cluster, skip this step.
+
+Setting up a Route 53 hosted zone allows you to transfer traffic seamlessly from from an existing cluster to a new cluster, thereby avoiding downtime. You can find the instructions for setting up a subdomain [here](../networking/custom-domain.md). You will need to update any clients interacting with your Cortex APIs to point to the new subdomain.
 
-# update your CLI to the latest version
-pip install --upgrade cortex
+### Export all APIs from your previous cluster
 
-# confirm version
+The `cluster export` command can be used to get the YAML specifications of all APIs deployed in your cluster:
+
+```bash
+cortex cluster export --name <previous_cluster_name> --region <region>
+```
+
+### Spin up a new cortex cluster
+
+If you are creating a new cluster with the same Cortex version:
+
+```bash
+cortex cluster up new-cluster.yaml --configure-env cortex2
+```
+
+This will create a CLI environment named `cortex2` for accessing the new cluster.
+
+If you are spinning a up a new cluster with a different Cortex version, first install the cortex CLI matching the desired cluster version:
+
+```bash
+# download the desired CLI version, replace 0.38.0 with the desired version (Note the "v"):
+bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)"
+
+# confirm Cortex CLI version
 cortex version
 
-# spin up your cluster
-cortex cluster up cluster.yaml
+# spin up your cluster using the new CLI version
+cortex cluster up cluster.yaml --configure-env cortex2
+```
+
+You can use different Cortex CLIs to interact with the different versioned clusters; here is an example:
+
+```bash
+# download the desired CLI version, replace 0.38.0 with the desired version (Note the "v"):
+CORTEX_INSTALL_PATH=$(pwd)/cortex0.38.0 bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)"
+
+# confirm cortex CLI version
+./cortex0.38.0 version
+```
+
+### Deploy the APIs to your new cluster
+
+Please read the [changelogs](https://github.com/cortexlabs/cortex/releases) and the latest documentation to identify any features and breaking changes in the new version. You may need to make modifications to your cluster and/or API configuration files.
+
+```bash
+cortex deploy -e cortex2 <api_spec_file>
+```
+
+After you've updated the API specifications and images if necessary, you can deploy them onto your new cluster.
+
+### Point your custom domain to your new cluster
+
+Verify that all of the APIs in your new cluster are working as expected by accessing via the cluster's API load balancer URL.
+
+Get the cluster's API load balancer URL:
+
+```bash
+cortex cluster info --name <new_cluster_name> --region <region>
 ```
 
-## Upgrade without downtime
+Once the APIs on the new cluster have been verified as working properly, it is recommended to update `min_replicas` of your APIs on the new cluster to match the current values in your previous cluster. This will avoid large sudden scale-up events as traffic is shifted to the new cluster.
 
-In production environments, you can upgrade your cluster without downtime if you have a backend service or DNS in front of your Cortex cluster:
+Then, navigate to the A record in your custom domains's Route 53 hosted zone and update the Alias to point the new cluster's API load balancer URL. Rather than suddenly routing all of your traffic from the previous cluster to the new cluster, you can use [weighted records](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-weighted) to incrementally route more traffic to your new cluster.
 
-1. Spin up a new cluster. For example: `cortex cluster up new-cluster.yaml --configure-env cortex2` (this will create a CLI environment named `cortex2` for accessing the new cluster).
-1. Re-deploy your APIs in your new cluster. For example, if the name of your CLI environment for your existing cluster is `cortex`, you can use `cortex get --env cortex` to list all running APIs in your cluster, and re-deploy them in the new cluster by running `cortex deploy --env cortex2` for each API. Alternatively, you can run `cortex cluster export --name <previous_cluster_name> --region <region>` to export the API specifications for all of your running APIs, change directories the folder that was exported, and run `cortex deploy --env cortex2 <file_name>` for each API that you want to deploy in the new cluster.
-1. Route requests to your new cluster.
-    * If you are using a custom domain: update the A record in your Route 53 hosted zone to point to your new cluster's API load balancer.
-    * If you have a backend service which makes requests to Cortex: update your backend service to make requests to the new cluster's endpoints.
-    * If you have a self-managed API Gateway in front of your Cortex cluster: update the routes to use new cluster's endpoints.
-1. Spin down your previous cluster. If you updated DNS settings, wait 24-48 hours before spinning down your previous cluster to allow the DNS cache to be flushed.
-1. You may now rename your new CLI environment name if you'd like (e.g. to rename it back to "cortex": `cortex env rename cortex2 cortex`)
+If you increased `min_replicas` for your APIs in the new cluster during the transition, you may reduce `min_replicas` back to your desired level once all traffic has been shifted.
+
+### Spin down the previous cluster
+
+After confirming that your previous cluster has completed servicing all existing traffic and is not receiving any new traffic, spin down your previous cluster:
+
+```bash
+# Note: it is recommended to install the Cortex CLI matching the previous cluster's version to ensure proper deletion.
+
+cortex cluster down --name <previous_cluster_name> --region <region>
+```