Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spegel distributed registry mirror #8977

Merged
merged 7 commits into from
Jan 9, 2024
Merged

Conversation

brandond
Copy link
Contributor

@brandond brandond commented Dec 1, 2023

Proposed Changes

This embeds spegel, a distributed registry mirror, into the K3s supervisor. In addition to reducing traffic against upstream registries, this also allows airgap images preloaded onto one node to be transparently shared to other nodes as needed.

spegel consists of a registry API backed by the local containerd image store, and a distributed hash table that allows nodes to gossip about which images and blobs they have available.

The local registry mirror is injected into the containerd config as the first mirror endpoint, followed by the user-selected endpoints, and finally the default endpoint. If any node in the cluster has an image, it will be pulled from that node, instead of from the registry or registry mirror.

The embedded registry is enabled at a cluster level via a CLI flag. When enabled, all nodes will listen on port 5001 for P2P traffic, secured by a preshared key. Agents will have a new listener on port 6443 that hosts the registry endpoint. The registry API is served over HTTPS, and requires a valid client certificate for access.

This appears to add ~4.4MB to the size of the K3s release artifact.

Checklist:

  • Write up ADR supporting this new feature and get team+PM approval
  • Get spegel working embedded in supervisor endpoint
  • Add agents to spegel cluster, instead of just proxying back to servers
  • Remove hardcoded values and wire into node config
  • Add CLI flag to enable, instead of always being enabled
  • Require TLS auth for registry endpoint
  • Require private network with preshared key
  • Add tests.
  • Add ipv6 support and push upstream.
    Handled by upstream in Fix support for ipv6 spegel-org/spegel#284
  • Push embedding, https, and nonblocking containerd client support upstream
    spegel-org/spegel@main...k3s-io:spegel:k3s-main

Builds on:

Inspired by:

Resolves:

Use:

  1. Start servers with --embedded-registry
  2. In order to enable distributed mirroring of an upstream registry, reference it in the mirror section of registries.yaml on every node that you want to participate in the sharing of images. The registry does not need to have any endpoints, although it may. For example, this is a valid configuration:
    mirrors:
      docker.io:
      registry.k8s.io:
      gcr.io:
      quay.io:
      ghcr.io:
  3. Nodes also may be started with --disable-default-registry-endpoint, in which case images will only be available via airgap tarball, distributed mirror, or user-configured mirror (in order of precendece)
  4. Check metrics:
    kubectl get --raw /api/v1/nodes/NODENAME/proxy/metrics | grep -F 'spegel_

Types of Changes

new feature

Verification

See above

Testing

E2e test added.

Linked Issues

  • TBD

User-Facing Change


Further Comments

@phillebaba
Copy link
Contributor

We can check off ipv6 support as I have merged spegel-org/spegel#284.

@brandond brandond force-pushed the embed-spegel branch 3 times, most recently from 18656bf to 8611079 Compare December 6, 2023 10:32
@harsimranmaan
Copy link
Contributor

Thanks Brandon. I am really happy to see such good progress on this one. I can help test this. If you need support convincing the team or the PMs, you can count my vote (and perhaps @adrianmoye's too) :)

@brandond brandond changed the title [WIP] Add spegel distributed registry mirror Add spegel distributed registry mirror Dec 7, 2023
@brandond brandond marked this pull request as ready for review December 7, 2023 22:27
@brandond brandond requested a review from a team as a code owner December 7, 2023 22:27
@onedr0p
Copy link
Contributor

onedr0p commented Dec 8, 2023

It's truly amazing the work you put into integrating spegel @brandond and so quickly, can't wait to try it out. ❤️

pkg/spegel/spegel.go Outdated Show resolved Hide resolved
@adrianmoye
Copy link

Thanks @brandond for the great work! I really appreciate it. This looks like the perfect solution for me, with pinning images (a feature I wasn't aware of) being the final point for my requirements.
@harsimranmaan thanks for ping me on this.

@brandond brandond force-pushed the embed-spegel branch 4 times, most recently from 878f888 to a1a8fba Compare December 9, 2023 02:34
@twistedgrim
Copy link

I just learned about Spegel and tried out it out. Freaking amazing work! Looking forward to this!

@rlex
Copy link
Contributor

rlex commented Dec 10, 2023

spegel exposes some useful metrics, will they be exposed in case of running embedded spegel? Cannot find anything related in PR

@onedr0p
Copy link
Contributor

onedr0p commented Dec 10, 2023

It would be very helpful useful to have those metrics exposed, their grafana dashboard is quite nice for reviewing what's going on.

manuelbuil
manuelbuil previously approved these changes Jan 5, 2024
Layer leases never did what we wanted anyways, and this is the new approved interface for ensuring that images do not get GCd

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Moving it into config.Agent so that we can use or modify it outside the context of containerd setup

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes crash when killing agent while waiting for config from server

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.