Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an ingress provider to get traffic into the cluster #2167

Open
14 of 15 tasks
yuvipanda opened this issue Feb 5, 2021 · 2 comments
Open
14 of 15 tasks

Use an ingress provider to get traffic into the cluster #2167

yuvipanda opened this issue Feb 5, 2021 · 2 comments

Comments

@yuvipanda
Copy link
Contributor

yuvipanda commented Feb 5, 2021

Current model to get traffic to hubs

Right now, for each hub, there exists a proxy-public service of type LoadBalancer. This gives it a public IP that routes traffic to just that hub. So port 80/443 go to the hub, and port 22/2222 goes to ssh/sftp. We create an A record in DNS to point to this public IP, for each hub.

There are a few issues with this is the requirement for manual DNS entry addition. This makes it more cumbersome and difficult to create new hubs - every other part of the new hub process is fairly automated. It also requires two manual deploys - one to get the public IP so we know what the A record should point to, and another deploy after the DNS entry has been made to enable HTTPS.

Given that we want to move to a model with many hubs (#2008), all these need to be automated.

Proposed new model with ingress providers

We should deploy the nginx-ingress provider, which will give us one public IP that can accept traffic for all our hubs. Ingress object are then used to provide host based routing for each of the hubs. cert-manager is used to provide HTTPS certificates automatically. A wildcard cert for *.datahub.berkeley.edu will point to the nginx-ingress' public IP, and we can configure z2jh to use those. This fully automates the DNS & HTTPS parts of getting a new hub up and running!

A caveat here is our experimental ssh / sftp services. They don't support any kind of host based routing, so when there's only one publc IP for every hub, these services will need their own ports. While pure TCP proxying isn't in the Kubernetes ingress spec, nginx-ingress supports it pretty well. We'll just have to manually allocate a port for each hub when we set it up. This is annoying, but probably ok.

TODO

Cluster setup

  • Setup nginx-ingress & cert-manager
  • Create wildcard DNS entry
  • Reserve the public IP so we don't lose it accidentally!

For each hub:

  • Set current DNS entry's TTL to 1s, to minimize downtime during switchover
  • Make a PR enabling ingress config for the hub
  • Once TTL has propagated, delete the DNS entry & deploy the PR quickly
  • Verify that DNS entry points to new hub, HTTPS certificates have been acquired, and things work.
  • Change the type of proxy-public to ClusterIP to release the public IP that has been aquired for it
  • Release the IP from Google Cloud console so we are no longer paying for it.

Hubs

  • cs194
  • stat159
  • data102
  • data100
  • prob140
  • workshop
  • highschool
  • julia
  • r
  • eecs
  • dlab
  • datahub

We should do this to every hub!

@yuvipanda
Copy link
Contributor Author

We'll have to:

  1. Allocate a port for ssh & sftp per-hub
  2. Add an entry to the nginx-ingress chart: https://github.com/helm/charts/blob/7e45e678e39b88590fe877f159516f85f3fd3f38/stable/nginx-ingress/values.yaml#L599
  3. Publicize this port for each hub.

Not as automated as I'd like, but definitely much better than allocating IPs

yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 5, 2021
- Remove the explicit DNS entry for cs194-staging,
  so the wildcard entry resolves it to our nginx-ingress.
- Setup ingress config for cs194-staging, so HTTPS is autogenerated.
- Manually allocate SSH / SFTP ports for cs194-staging.
  nginx-ingress will proxy incoming ssh connections to the correct
  hub based on the ports folks are connecting via.
- jupyterhub-ssh helm chart bump so we can allow traffic from
  nginx-ingress via networkpolicy

I'll deploy manually to a few hubs, and slowly roll it out
across all the hubs.

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 5, 2021
We have a wildcard domain from *.datahub.berkele.edu
to our nginx-ingress IP. This simplifies new hub creation a
*lot*.

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 5, 2021
I'll remove the non-wildcard DNS entry just before
deployment.

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021
nginx-ingress trouble will bring *all* our hubs down
once we switch everthing to it. So let's give it a wide
berth and lots of resources.

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 6, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 7, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 7, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 8, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 8, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021
We wanna release our unused static IP addresses,
since all traffic into most hubs now comes in
via the ingress controller. The exceptions are
the data8x hubs, cs194 prod and the datahubs. We
explicitly mark those as LoadBalancer to keep their
public IPs intact. I've already removed their DNS
entries.

You can't actually just change the type from
LoadBalancer to ClusterIP (kubernetes/kubectl#221),
so this command was used to patch them manually
k get ns | rg staging | rg -v datahub | awk '{ print $1; }' | xargs -L1 -I{} kubectl -n {} patch svc proxy-public --type='json' -p '[{"op":"replace","path":"/spec/type","value":"ClusterIP"},{"op":"replace","path":"/spec/ports/0/nodePort","value":null},{"op":"replace","path":"/spec/ports/1/nodePort","value":null},{"op":"replace","path":"/spec/ports/2/nodePort","value":null}]'

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 9, 2021
We wanna release our unused static IP addresses,
since all traffic into most hubs now comes in
via the ingress controller. The exceptions are
the data8x hubs, cs194 prod and the datahubs. We
explicitly mark those as LoadBalancer to keep their
public IPs intact. I've already removed their DNS
entries.

You can't actually just change the type from
LoadBalancer to ClusterIP (kubernetes/kubectl#221),
so this command was used to patch them manually
k get ns | rg staging | rg -v datahub | awk '{ print $1; }' | xargs -L1 -I{} kubectl -n {} patch svc proxy-public --type='json' -p '[{"op":"replace","path":"/spec/type","value":"ClusterIP"},{"op":"replace","path":"/spec/ports/0/nodePort","value":null},{"op":"replace","path":"/spec/ports/1/nodePort","value":null},{"op":"replace","path":"/spec/ports/2/nodePort","value":null}]'

Ref #2167
yuvipanda referenced this issue in yuvipanda/datahub-old-fork Feb 19, 2021
@yuvipanda
Copy link
Contributor Author

Moving the entry for datahub.berkeley.edu will be a little tricky, since we can't set TTL there. One possibility is to move the nginx-ingress to use datahub's public IP. Not sure if that's the way to go. Something to be investigated

yuvipanda added a commit to yuvipanda/datahub that referenced this issue Sep 23, 2021
This gets metrics about requests and response codes
from nginx into prometheus, so we can look for 5xx and
4xx errors from it.

Note that datahub.berkeley.edu does *not* go through this,
but everything else. berkeley-dsep-infra#2167
tracks that.

Ref berkeley-dsep-infra#2693
yuvipanda added a commit to yuvipanda/datahub that referenced this issue Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant