Use an ingress provider to get traffic into the cluster #2167

yuvipanda · 2021-02-05T13:18:42Z

Current model to get traffic to hubs

Right now, for each hub, there exists a proxy-public service of type LoadBalancer. This gives it a public IP that routes traffic to just that hub. So port 80/443 go to the hub, and port 22/2222 goes to ssh/sftp. We create an A record in DNS to point to this public IP, for each hub.

There are a few issues with this is the requirement for manual DNS entry addition. This makes it more cumbersome and difficult to create new hubs - every other part of the new hub process is fairly automated. It also requires two manual deploys - one to get the public IP so we know what the A record should point to, and another deploy after the DNS entry has been made to enable HTTPS.

Given that we want to move to a model with many hubs (#2008), all these need to be automated.

Proposed new model with ingress providers

We should deploy the nginx-ingress provider, which will give us one public IP that can accept traffic for all our hubs. Ingress object are then used to provide host based routing for each of the hubs. cert-manager is used to provide HTTPS certificates automatically. A wildcard cert for *.datahub.berkeley.edu will point to the nginx-ingress' public IP, and we can configure z2jh to use those. This fully automates the DNS & HTTPS parts of getting a new hub up and running!

A caveat here is our experimental ssh / sftp services. They don't support any kind of host based routing, so when there's only one publc IP for every hub, these services will need their own ports. While pure TCP proxying isn't in the Kubernetes ingress spec, nginx-ingress supports it pretty well. We'll just have to manually allocate a port for each hub when we set it up. This is annoying, but probably ok.

TODO

Cluster setup

Setup nginx-ingress & cert-manager
Create wildcard DNS entry
Reserve the public IP so we don't lose it accidentally!

For each hub:

Set current DNS entry's TTL to 1s, to minimize downtime during switchover
Make a PR enabling ingress config for the hub
Once TTL has propagated, delete the DNS entry & deploy the PR quickly
Verify that DNS entry points to new hub, HTTPS certificates have been acquired, and things work.
Change the type of proxy-public to ClusterIP to release the public IP that has been aquired for it
Release the IP from Google Cloud console so we are no longer paying for it.

Hubs

We should do this to every hub!

The text was updated successfully, but these errors were encountered:

yuvipanda · 2021-02-05T13:23:22Z

We'll have to:

Allocate a port for ssh & sftp per-hub
Add an entry to the nginx-ingress chart: https://github.com/helm/charts/blob/7e45e678e39b88590fe877f159516f85f3fd3f38/stable/nginx-ingress/values.yaml#L599
Publicize this port for each hub.

Not as automated as I'd like, but definitely much better than allocating IPs

- Remove the explicit DNS entry for cs194-staging, so the wildcard entry resolves it to our nginx-ingress. - Setup ingress config for cs194-staging, so HTTPS is autogenerated. - Manually allocate SSH / SFTP ports for cs194-staging. nginx-ingress will proxy incoming ssh connections to the correct hub based on the ports folks are connecting via. - jupyterhub-ssh helm chart bump so we can allow traffic from nginx-ingress via networkpolicy I'll deploy manually to a few hubs, and slowly roll it out across all the hubs. Ref #2167

We have a wildcard domain from *.datahub.berkele.edu to our nginx-ingress IP. This simplifies new hub creation a *lot*. Ref #2167

I'll remove the non-wildcard DNS entry just before deployment. Ref #2167

nginx-ingress trouble will bring *all* our hubs down once we switch everthing to it. So let's give it a wide berth and lots of resources. Ref #2167

Ref #2167

We wanna release our unused static IP addresses, since all traffic into most hubs now comes in via the ingress controller. The exceptions are the data8x hubs, cs194 prod and the datahubs. We explicitly mark those as LoadBalancer to keep their public IPs intact. I've already removed their DNS entries. You can't actually just change the type from LoadBalancer to ClusterIP (kubernetes/kubectl#221), so this command was used to patch them manually k get ns | rg staging | rg -v datahub | awk '{ print $1; }' | xargs -L1 -I{} kubectl -n {} patch svc proxy-public --type='json' -p '[{"op":"replace","path":"/spec/type","value":"ClusterIP"},{"op":"replace","path":"/spec/ports/0/nodePort","value":null},{"op":"replace","path":"/spec/ports/1/nodePort","value":null},{"op":"replace","path":"/spec/ports/2/nodePort","value":null}]' Ref #2167

Ref #2167

yuvipanda · 2021-02-19T10:02:48Z

Moving the entry for datahub.berkeley.edu will be a little tricky, since we can't set TTL there. One possibility is to move the nginx-ingress to use datahub's public IP. Not sure if that's the way to go. Something to be investigated

This gets metrics about requests and response codes from nginx into prometheus, so we can look for 5xx and 4xx errors from it. Note that datahub.berkeley.edu does *not* go through this, but everything else. berkeley-dsep-infra#2167 tracks that. Ref berkeley-dsep-infra#2693

Final bits of berkeley-dsep-infra#2167

yuvipanda mentioned this issue Feb 5, 2021

Get traffic into cs194 staging hub with ingresses #2168

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 5, 2021

Remove instructions on DNS & static IP sets

63e65cf

We have a wildcard domain from *.datahub.berkele.edu to our nginx-ingress IP. This simplifies new hub creation a *lot*. Ref #2167

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 5, 2021

Enable ingress for stat159

1da40e3

I'll remove the non-wildcard DNS entry just before deployment. Ref #2167

yuvipanda mentioned this issue Feb 5, 2021

Enable ingress for stat159 #2170

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021

Set resource requests for nginx-ingress

153299b

nginx-ingress trouble will bring *all* our hubs down once we switch everthing to it. So let's give it a wide berth and lots of resources. Ref #2167

yuvipanda mentioned this issue Feb 6, 2021

Set resource requests for nginx-ingress #2172

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021

Use ingress to get traffic into dlab hub

f2250e8

Ref #2167

yuvipanda mentioned this issue Feb 6, 2021

Use ingress to get traffic into dlab hub #2173

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021

Use ingress objects in workshop hub

46dbbed

Ref #2167

yuvipanda mentioned this issue Feb 6, 2021

Use ingress objects in workshop hub #2174

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021

Use ingresses in Julia hub

b2a81f8

Ref #2167

yuvipanda mentioned this issue Feb 6, 2021

Use ingresses in Julia hub #2179

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 7, 2021

Move biology hub to ingress

5f35da2

Ref #2167

yuvipanda mentioned this issue Feb 7, 2021

Move biology hub to ingress #2184

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 7, 2021

Move highschool hub to ingress

78c9530

Ref #2167

yuvipanda mentioned this issue Feb 7, 2021

Move highschool hub to ingress #2186

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 8, 2021

Move data102 to ingress

a80d71e

Ref #2167

yuvipanda mentioned this issue Feb 8, 2021

Move data102 to ingress #2196

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 8, 2021

Move r hub to use ingress

8cee8a7

Ref #2167

yuvipanda mentioned this issue Feb 8, 2021

Move r hub to use ingress #2198

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021

Move prob140 to ingress

12b7125

Ref #2167

yuvipanda mentioned this issue Feb 9, 2021

Move prob140, data100, eecs to ingress #2205

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021

Move data100 to prod

a734ba6

Ref #2167

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021

Move eecs hub to ingress

2e18d47

Ref #2167

yuvipanda mentioned this issue Feb 9, 2021

Mark most proxy-public services as ClusterIP #2213

Merged

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 19, 2021

Switch cs194 to using ingress

1c9fa5b

Ref #2167

yuvipanda mentioned this issue Feb 19, 2021

Switch cs194 to using ingress #2224

Merged

yuvipanda mentioned this issue Sep 23, 2021

Scrape HTTP request metrics from nginx #2792

Merged

yuvipanda added a commit to yuvipanda/datahub that referenced this issue Oct 4, 2021

Enable ingress for datahub staging

addf963

Final bits of berkeley-dsep-infra#2167

yuvipanda mentioned this issue Oct 4, 2021

Enable ingress for datahub staging #2832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use an ingress provider to get traffic into the cluster #2167

Use an ingress provider to get traffic into the cluster #2167

yuvipanda commented Feb 5, 2021 •

edited

Loading

yuvipanda commented Feb 5, 2021

yuvipanda commented Feb 19, 2021

Use an ingress provider to get traffic into the cluster #2167

Use an ingress provider to get traffic into the cluster #2167

Comments

yuvipanda commented Feb 5, 2021 • edited Loading

Current model to get traffic to hubs

Proposed new model with ingress providers

TODO

yuvipanda commented Feb 5, 2021

yuvipanda commented Feb 19, 2021

yuvipanda commented Feb 5, 2021 •

edited

Loading