Skip to content

HTTPS setup #1069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions cli/cmd/lib_cluster_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,11 @@ func setConfigFieldsFromCached(userClusterConfig *clusterconfig.Config, cachedCl
}
userClusterConfig.AvailabilityZones = cachedClusterConfig.AvailabilityZones

if s.Obj(cachedClusterConfig.SSLCertificateARN) != s.Obj(userClusterConfig.SSLCertificateARN) {
return clusterconfig.ErrorConfigCannotBeChangedOnUpdate(clusterconfig.SSLCertificateARNKey, cachedClusterConfig.SSLCertificateARN)
}
userClusterConfig.SSLCertificateARN = cachedClusterConfig.SSLCertificateARN

if userClusterConfig.InstanceVolumeSize != cachedClusterConfig.InstanceVolumeSize {
return clusterconfig.ErrorConfigCannotBeChangedOnUpdate(clusterconfig.InstanceVolumeSizeKey, cachedClusterConfig.InstanceVolumeSize)
}
Expand Down Expand Up @@ -505,6 +510,10 @@ func clusterConfigConfirmaionStr(clusterConfig clusterconfig.Config, awsCreds AW
items.Add(clusterconfig.MinInstancesUserKey, *clusterConfig.MinInstances)
items.Add(clusterconfig.MaxInstancesUserKey, *clusterConfig.MaxInstances)
items.Add(clusterconfig.TagsKey, s.ObjFlatNoQuotes(clusterConfig.Tags))
if clusterConfig.SSLCertificateARN != nil {
items.Add(clusterconfig.SSLCertificateARNKey, *clusterConfig.SSLCertificateARN)
}

if clusterConfig.InstanceVolumeSize != defaultConfig.InstanceVolumeSize {
items.Add(clusterconfig.InstanceVolumeSizeUserKey, clusterConfig.InstanceVolumeSize)
}
Expand Down
3 changes: 3 additions & 0 deletions docs/cluster-management/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ tags: # <string>: <string> map of key/value pairs
# whether to use spot instances in the cluster (default: false)
# see https://cortex.dev/v/master/cluster-management/spot-instances for additional details on spot configuration
spot: false

# see https://cortex.dev/v/master/guides/subdomain-https-setup for instructions on how to set up HTTPS for APIs
ssl_certificate_arn: # if empty, APIs will still be accessible via HTTPS (in addition to HTTP), but will not use a trusted certificate
```

The default docker images used for your Predictors are listed in the instructions for [system packages](../deployments/system-packages.md), and can be overridden in your [API configuration](../deployments/api-configuration.md).
Expand Down
4 changes: 4 additions & 0 deletions docs/cluster-management/uninstall.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,7 @@ aws s3 rb --force s3://<bucket>
# delete the log group (replace <log_group> with what was configured during installation, default: cortex)
aws logs describe-log-groups --log-group-name-prefix=<log_group> --query logGroups[*].[logGroupName] --output text | xargs -I {} aws logs delete-log-group --log-group-name {}
```

If you've setup API gateway and want to delete it, please follow these [instructions](../guides/api-gateway.md#cleanup).

If you've configured HTTPS by specifying an SSL Certificate for a subdomain in your cluster configuration, you may wish to remove the SSL Certificate and Hosted Zone for the domain by following these [instructions](../guides/subdomain-https-setup.md#cleanup).
2 changes: 2 additions & 0 deletions docs/cluster-management/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,5 @@ cortex cluster up
```

In production environments, you can upgrade your cluster without downtime if you have a service in front of your Cortex cluster (for example, you can [configure API Gateway as a proxy service](../guides/api-gateway.md)): first spin up your new cluster, then update your client-facing service to route traffic to your new cluster, and then spin down your old cluster.

If you've set up HTTPS by specifying an SSL Certificate for a subdomain in your cluster configuration, you can upgrade your cluster with minimal downtime: first spin up a new cluster, then update the A record in your subdomain hosted zone to point to the API loadbalancer of your new cluster. Wait at least 60 seconds before spinning down the old cluster because DNS entries for loadbalancers refresh every 60 seconds (see [here](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#request-routing) for more details).
155 changes: 155 additions & 0 deletions docs/guides/subdomain-https-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Set up HTTPS on a subdomain

_WARNING: you are on the master branch, please refer to the docs on the branch that matches your `cortex version`_

The recommended way to set up HTTPS with trusted certificates is by using [API Gateway](../api-gateway.md) because it's simpler and enables you to use API Gateway features such as rate limiting (it also supports custom domains). This guide is only recommended if HTTPS is required and you don't wish to use API Gateway (e.g. it doesn't support your use case due to limitations such as the 29 second request timeout).

This guide will demonstrate how to create a dedicated subdomain in AWS Route 53 and use an SSL certificate provisioned by AWS Certificate Manager (ACM) to support HTTPS traffic to Cortex APIs. By the end of this guide, you will have a Cortex cluster with APIs accessible via `https://<your-subdomain>/<api-endpoint>`.

You must own a domain and be able to modify its DNS records.

## Step 1

Decide on a subdomain that you want to dedicate to Cortex APIs. For example if your domain is `example.com`, a valid subdomain can be `api.example.com`.

This guide will use `cortexlabs.dev` as the example domain and `api.cortexlabs.dev` as the subdomain.

## Step 2

We will set up a hosted zone on Route 53 to manage the DNS records for the subdomain. Go to the [Route 53 console](https://console.aws.amazon.com/route53/home) and click "Hosted Zones".

![step 2](https://user-images.githubusercontent.com/4365343/82210754-a6b07d00-98dd-11ea-9cec-9f6b07282aa8.png)

## Step 3

Click "Create Hosted Zone" and then enter your subdomain as the domain name for your hosted zone and click "Create".

![step 3](https://user-images.githubusercontent.com/4365343/82211091-4968fb80-98de-11ea-8ec4-8d26d1aea77a.png)

## Step 4

Take note of the values in the NS record.

![step 4](https://user-images.githubusercontent.com/4365343/82211656-386cba00-98df-11ea-8c86-4961082b5f49.png)

## Step 5

Navigate to your root DNS service provider (e.g. Google Domains, AWS Route 53, Go Daddy). Your root DNS service provider is typically the registrar where you purchased your domain (unless you have transferred DNS management elsewhere). The procedure for adding DNS records may vary based on your service provider.

We are going to add an NS (name server) record that specifies that any traffic to your subdomain should use the name servers of your hosted zone in Route 53 for DNS resolution.

`cortexlabs.dev` is managed by Google Domains. The image below is a screenshot for adding a DNS record in Google Domains (your UI may differ based on your DNS service provider).

![step 5](https://user-images.githubusercontent.com/4365343/82211959-bcbf3d00-98df-11ea-834d-692b3bcf9332.png)

## Step 6

We are now going to create an SSL certificate for your subdomain. Go to the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home) and click "Get Started" under the "Provision certificates" section.

![step 6](https://user-images.githubusercontent.com/4365343/82202340-c04ac800-98cf-11ea-9472-89dd6d67eb0d.png)

## Step 7

Select "Request a public certificate" and then "Request a certificate".

![step 7](https://user-images.githubusercontent.com/4365343/82202654-3e0ed380-98d0-11ea-8c57-025f0b69c54f.png)

## Step 8

Enter your subdomain and then click "Next".

![step 8](https://user-images.githubusercontent.com/4365343/82224652-1cbedf00-98f2-11ea-912b-466cee2f6e25.png)

## Step 9

Select "DNS validation" and then click "Next".

![step 9](https://user-images.githubusercontent.com/4365343/82205311-66003600-98d4-11ea-90e3-da7e8b0b2b9c.png)

## Step 10

Add tags for searchability (optional) then click "Review".

![step 10](https://user-images.githubusercontent.com/4365343/82206485-52ee6580-98d6-11ea-95a9-1d0ebafc178a.png)

## Step 11

Click "Confirm and request".

![step 11](https://user-images.githubusercontent.com/4365343/82206602-84ffc780-98d6-11ea-9f2f-ce383404ec67.png)

## Step 12

Click "Create record in Route 53". A popup will appear indicating that a Record is going to be added to Route 53. Click "Create" to automatically add the DNS record to your subdomain's hosted zone. Then click "Continue".

![step 12](https://user-images.githubusercontent.com/4365343/82223539-c8ffc600-98f0-11ea-93a2-044aa0c9670d.png)

## Step 13

Wait for the Certificate Status to be "issued". This might take a few minutes.

![step 13](https://user-images.githubusercontent.com/4365343/82209663-a616e700-98db-11ea-95cb-c6efedadb942.png)

## Step 14

Take note of the certificate's ARN. The certificate is ineligible for renewal because it is currently not being used. It will be eligible for renewal after it is used in Cortex.

![step 14](https://user-images.githubusercontent.com/4365343/82222684-9e613d80-98ef-11ea-98c0-5a20b457f062.png)

## Step 15

Add the following field to your cluster configuration:

```yaml
# cluster.yaml

...

ssl_certificate_arn: <ARN of your certificate>
```

and then create a Cortex cluster.

```bash
$ cortex cluster up --config cluster.yaml
```

## Step 16

After your cluster has been created, navigate to your [EC2 Load Balancer console](https://us-west-2.console.aws.amazon.com/ec2/v2/home#LoadBalancers:sort=loadBalancerName) and locate the Cortex API load balancer. You can determine which is the API load balancer by inspecting the `kubernetes.io/service-name` tag.

Take note of the load balancer's name.

![step 16](https://user-images.githubusercontent.com/808475/80142777-961c1980-8560-11ea-9202-40964dbff5e9.png)

## Step 17

Go to the hosted zone you created in the [Route 53 console](https://console.aws.amazon.com/route53/home#hosted-zones:) and add an Alias record that routes traffic to your Cortex cluster's API load balancer (leave "Name" blank).

![step 17](https://user-images.githubusercontent.com/4365343/82228372-08311580-98f7-11ea-9faa-24050fc432d8.png)

### Using your new endpoint

You may now use your subdomain in place of your API load balancer endpoint in your client. For example, this curl request:

```bash
curl http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/iris-classifier -X POST -H "Content-Type: application/json" -d @sample.json
```

Would become:

```bash
curl https://api.cortexlabs.dev/iris-classifier -X POST -H "Content-Type: application/json" -d @sample.json
```

### Cleanup

Spin down your Cortex cluster.

Delete the hosted zone for your subdomain in the [Route 53 console](https://console.aws.amazon.com/route53/home#hosted-zones:):

![delete hosted zone](https://user-images.githubusercontent.com/4365343/82228729-81306d00-98f7-11ea-8570-e9de15f5267f.png)

Delete your certificate from the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home):

![delete certificate](https://user-images.githubusercontent.com/4365343/82228835-a624e000-98f7-11ea-92e2-cb4fb0f591e2.png)
4 changes: 3 additions & 1 deletion docs/miscellaneous/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,6 @@ In order to connect to the operator via the CLI, you must provide valid AWS cred

All APIs are accessible via HTTPS (in addition to HTTP). The SSL certificate is autogenerated during installation using `localhost` as the Common Name (CN). Therefore, clients will need to skip certificate verification (e.g. `curl -k`) when using HTTPS.

To use AWS's default (trusted) certificate or your own certificate, you can [set up API Gateway](../guides/api-gateway.md) to be a proxy to your Cortex cluster. Since the API load balancer created by Cortex is internet-facing by default, you will also need to set `api_load_balancer_scheme: internal` in your [cluster configuration](config.md) file (before creating your cluster) in order to force traffic to go through your API Gateway endpoint.
To use AWS's default (trusted) certificate or your own certificate, you can [set up API Gateway](../guides/api-gateway.md) to be a proxy to your Cortex cluster. Since the API load balancer created by Cortex is internet-facing by default, in order to force traffic to go through your API Gateway endpoint, you will also need to set `api_load_balancer_scheme: internal` in your [cluster configuration](config.md) file (before creating your cluster).

Alternatively, you can create an SSL certificate for your custom domain, and use this certificate in the API load balancer (see our instructions for [setting up HTTPS on a subdomain](../guides/subdomain-https-setup.md)).
1 change: 1 addition & 0 deletions docs/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
## Guides

* [Set up AWS API gateway](guides/api-gateway.md)
* [Set up HTTPS on a subdomain](guides/subdomain-https-setup.md)
* [Plot response code counts](guides/plot-response-code-counts.md)
* [Plot API request time](guides/plot-request-time.md)
* [Plot in-flight requests](guides/plot-in-flight-requests.md)
Expand Down
5 changes: 5 additions & 0 deletions manager/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,11 @@ function setup_istio() {
export CORTEX_OPERATOR_LOAD_BALANCER_ANNOTATION='service.beta.kubernetes.io/aws-load-balancer-internal: "true"'
fi

export CORTEX_SSL_CERTIFICATE_ANNOTATION=""
if [[ -n "$CORTEX_SSL_CERTIFICATE_ARN" ]]; then
export CORTEX_SSL_CERTIFICATE_ANNOTATION="service.beta.kubernetes.io/aws-load-balancer-ssl-cert: $CORTEX_SSL_CERTIFICATE_ARN"
fi

envsubst < manifests/istio-values.yaml | helm template istio-manifests/istio --values - --name istio --namespace istio-system | kubectl apply -f - >/dev/null
}

Expand Down
3 changes: 3 additions & 0 deletions manager/manifests/istio-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,16 @@ gateways:
${CORTEX_API_LOAD_BALANCER_ANNOTATION}
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: ${CORTEX_TAGS}
${CORTEX_SSL_CERTIFICATE_ANNOTATION}
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
type: LoadBalancer
externalTrafficPolicy: Local # https://medium.com/pablo-perez/k8s-externaltrafficpolicy-local-or-cluster-40b259a19404, https://www.asykim.com/blog/deep-dive-into-kubernetes-external-traffic-policies
ports:
- port: 80
targetPort: 80
name: http2
- port: 443
targetPort: 80
name: https
- port: 31400
name: tcp
Expand Down
38 changes: 38 additions & 0 deletions pkg/lib/aws/acm.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
Copyright 2020 Cortex Labs, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package aws

import (
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/service/acm"
"github.com/cortexlabs/cortex/pkg/lib/errors"
)

func (c *Client) DoesCertificateExist(sslCertificateARN string) (bool, error) {
_, err := c.ACM().DescribeCertificate(&acm.DescribeCertificateInput{
CertificateArn: aws.String(sslCertificateARN),
})

if err != nil {
if IsErrCode(err, "ResourceNotFoundException") {
return false, nil
}
return false, errors.Wrap(err, sslCertificateARN)
}

return true, nil
}
9 changes: 9 additions & 0 deletions pkg/lib/aws/clients.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package aws

import (
"github.com/aws/aws-sdk-go/service/acm"
"github.com/aws/aws-sdk-go/service/autoscaling"
"github.com/aws/aws-sdk-go/service/cloudformation"
"github.com/aws/aws-sdk-go/service/cloudwatch"
Expand All @@ -37,6 +38,7 @@ type clients struct {
sts *sts.STS
ec2 *ec2.EC2
ecr *ecr.ECR
acm *acm.ACM
autoscaling *autoscaling.AutoScaling
cloudWatchLogs *cloudwatchlogs.CloudWatchLogs
cloudWatchMetrics *cloudwatch.CloudWatch
Expand Down Expand Up @@ -101,6 +103,13 @@ func (c *Client) Autoscaling() *autoscaling.AutoScaling {
return c.clients.autoscaling
}

func (c *Client) ACM() *acm.ACM {
if c.clients.acm == nil {
c.clients.acm = acm.New(c.sess)
}
return c.clients.acm
}

func (c *Client) CloudWatchLogs() *cloudwatchlogs.CloudWatchLogs {
if c.clients.cloudWatchLogs == nil {
c.clients.cloudWatchLogs = cloudwatchlogs.New(c.sess)
Expand Down
2 changes: 1 addition & 1 deletion pkg/lib/aws/cloudwatch.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ func (c *Client) DoesLogGroupExist(logGroup string) (bool, error) {
LogGroupName: aws.String(logGroup),
})
if err != nil {
if CheckErrCode(err, "ResourceNotFoundException") {
if IsErrCode(err, "ResourceNotFoundException") {
return false, nil
}
return false, errors.Wrap(err, "log group "+logGroup)
Expand Down
10 changes: 5 additions & 5 deletions pkg/lib/aws/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,26 +39,26 @@ const (
)

func IsNotFoundErr(err error) bool {
return CheckErrCode(err, "NotFound")
return IsErrCode(err, "NotFound")
}

func IsNoSuchKeyErr(err error) bool {
return CheckErrCode(err, "NoSuchKey")
return IsErrCode(err, "NoSuchKey")
}

func IsNoSuchBucketErr(err error) bool {
return CheckErrCode(err, "NoSuchBucket")
return IsErrCode(err, "NoSuchBucket")
}

func IsForbiddenErr(err error) bool {
return CheckErrCode(err, "Forbidden")
return IsErrCode(err, "Forbidden")
}

func IsGenericNotFoundErr(err error) bool {
return IsNotFoundErr(err) || IsNoSuchKeyErr(err) || IsNoSuchBucketErr(err)
}

func CheckErrCode(err error, errorCode string) bool {
func IsErrCode(err error, errorCode string) bool {
awsErr, ok := errors.CauseOrSelf(err).(awserr.Error)
if !ok {
return false
Expand Down
4 changes: 2 additions & 2 deletions pkg/operator/operator/logs.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ func streamFromCloudWatch(apiName string, podCheckCancel chan struct{}, socket *
})

if err != nil {
if !awslib.CheckErrCode(err, cloudwatchlogs.ErrCodeResourceNotFoundException) {
if !awslib.IsErrCode(err, cloudwatchlogs.ErrCodeResourceNotFoundException) {
telemetry.Error(err)
writeAndCloseSocket(socket, "error encountered while fetching logs from cloudwatch: "+errors.Message(err))
continue
Expand Down Expand Up @@ -218,7 +218,7 @@ func getLogStreams(logGroupName string) (strset.Set, error) {
Limit: aws.Int64(_maxStreamsPerRequest),
})
if err != nil {
if !awslib.CheckErrCode(err, cloudwatchlogs.ErrCodeResourceNotFoundException) {
if !awslib.IsErrCode(err, cloudwatchlogs.ErrCodeResourceNotFoundException) {
return nil, err
}
return nil, nil
Expand Down
Loading