Description
Description
This is a request to enhance the documentation that describes deploying to AWS and calling a remote endpoint.
Motivation
A lot goes on behind the scenes in the AWS deploy that does not in the local deployment. Between the cortex cluster up
command and cortex deploy
, it takes 15 mins and many resources are created.
It's also not totally clear how to get the remote endpoint to call, so it might be good to remind users about the usefulness of the cortex get sentiment-classifier
command.
Additional context
Could we add more documentation? e.g.
cortex cluster up
This brings up a cluster in AWS by deploying a number of CloudFormation stacks and takes about 15 mins. Additional information about deployment and resources is shown on the command line.
After the cluster is deployed, you'll see:
cortex is ready! an environment named "aws" has been configured for this cluster; append `--env=aws` to cortex commands to reference it, or set it as your default with `cortex env default aws`
To get the right endpoint to call, use
cortex get
:cortex get sentiment-classifer --env aws
This returns a lot of useful information like:
cortex get sentiment-classifier --env aws status up-to-date stale requested last update avg request 2XX updating 0 1 1 21s - - endpoint: http://*****elb.us-east-1.amazonaws.com/sentiment-classifier curl: curl http://*****.elb.us-east-1.amazonaws.com/sentiment-classifier -X POST -H "Content-Type: application/json" -d @sample.json configuration name: sentiment-classifier endpoint: /sentiment-classifier predictor: type: python path: predictor.py image: cortexlabs/python-predictor-cpu:0.16.1 compute: cpu: 1 mem: 2G autoscaling: min_replicas: 1 max_replicas: 100 init_replicas: 1 workers_per_replica: 1 threads_per_worker: 1 target_replica_concurrency: 1.0 max_replica_concurrency: 1024 window: 1m0s downscale_stabilization_period: 5m0s upscale_stabilization_period: 1m0s max_downscale_factor: 0.75 max_upscale_factor: 1.5 downscale_tolerance: 0.05 upscale_tolerance: 0.05 update_strategy: max_surge: 25% max_unavailable: 25%
Then, to serve predictions in AWS:
curl http://***.amazonaws.com/iris-classifier \ -X POST -H "Content-Type: application/json" \ -d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}' "setosa"