Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spegel pod gets into CrashLoopBackOff with k3s #212

Closed
akakream opened this issue Sep 26, 2023 · 18 comments · Fixed by #335
Closed

Spegel pod gets into CrashLoopBackOff with k3s #212

akakream opened this issue Sep 26, 2023 · 18 comments · Fixed by #335
Labels
enhancement New feature or request

Comments

@akakream
Copy link

Hi there,

I have a k3s cluster, where I want to use spegel. I install spegel as described in README using helm upgrade --create-namespace --namespace spegel --install --version v0.0.11 spegel oci://ghcr.io/xenitab/helm-charts/spegel

However, the pod gets into CrashLoopBackOff with the following error, when I check the logs:

Defaulted container "registry" out of: registry, configuration (init)
{"level":"error","ts":1695750332.7763488,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

How can I solve this problem? I would appreciate any help.

Thanks a lot!

@phillebaba
Copy link
Member

The error that you are seeing occurs because k3s uses a different path for its Containerd socket file. The default configuration uses a path that is common for most other flavors.

The issue with k3s is that support is a bit tricky currently until k3s-io/k3s#5568 is fixed. There has been some push back against adopting the new Containerd configuration format for a while. It does however look like this feature will be implemented anyways. I was hoping that this would be fixed by now, the milestone has been moved a couple of times.

While it is in theory possible to make Spegel work with k3s, it is tricky and requires restarting Containerd on all nodes in the cluster. I do not think that most people are willing to do that. Instead I have been waiting for the issue to be solved until I spend time testing and documenting compatibility with k3s.

I will add a note about k3s to the compatibility docs, and update them when the issue is solved.

@akakream
Copy link
Author

akakream commented Oct 4, 2023

Thanks a lot for the answer. I see.

It would be great to make it work with k3s. Adding compatibility with k3s to the docs would help a lot!

I am trying myself and if I find a way, I will note here.

@akakream
Copy link
Author

akakream commented Oct 4, 2023

Here is how I tried to make Spegel work with k3s:

  • I added
[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/etc/containerd/certs.d"

to /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl.

  • I restarted Containerd and k3s.

The error that I am getting:

Defaulted container "registry" out of: registry, configuration (init)
{"level":"error","ts":1696454496.254962,"caller":"build/main.go:72","msg":"","error":"Containerd registry config path needs to be set for mirror configuration to take effect","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

I do not know what I should do next. Any suggestions?

@phillebaba
Copy link
Member

Looks like k3s uses a different path for its Containerd socket. Try changing the socket path and reinstall Spegel.

spegel:
  containerdSock: "/run/k3s/containerd/containerd.sock"

I will look at why it is printing the error message if the socket path is wrong.

@akakream
Copy link
Author

akakream commented Oct 5, 2023

Thanks a lot for the direction @phillebaba!

To document setting up Spegel on k3s for others who may try, I am noting further points here:

  • After applying your suggestion, I finally got Spegel running on my master node. However, I get the following warning as the last event from the pod on the master node: Warning Unhealthy 18m (x2 over 18m) kubelet Startup probe failed: Get "http://10.42.0.110:5000/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  • The second pod, unfortunately, gets into CrashLoopBackOff. The logs tell me the previous error that I got which is
Defaulted container "registry" out of: registry, configuration (init)
{"level":"error","ts":1696541474.0246248,"caller":"build/main.go:72","msg":"","error":"Containerd registry config path needs to be set for mirror configuration to take effect","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

This was solved on the master node by applying this suggestion from @phillebaba's previous comment:

spegel:
  containerdSock: "/run/k3s/containerd/containerd.sock"

When I describe the pod, the containerdSock seems to be set to /run/k3s/containerd/containerd.sock.
I believe there is another problem.

  • My setup: k3s with a master node and a worker node on different hosts. Both virtual machines are on GCP.

Any suggestions from anybody is appreciated. Thanks a lot!

@akakream
Copy link
Author

akakream commented Oct 6, 2023

After playing a bit more on the worker node, I made Spegel work with k3s I guess.

  • After setting the worker node up in agent mode in k3s, I applied the changes described here Spegel pod gets into CrashLoopBackOff with k3s #212 (comment). It turned out that the k3s service running in agent mode is different than in server mode. The service that runs in agent mode is k3s-agent; this needs to be restarted. After the restart, pods stopped crashing.

Thanks a lot for your help @phillebaba along this process. I really appreciate it.

@shanmugara
Copy link

I am trying out spegel on a 4 node ubuntu k8s 1.23 cluster with 1 master and 3 workers. After doing a helm install based on the docs, I am seeing all spegel pods are CLBO,

+ spegel-ck69r › registry
spegel-ck69r registry {"level":"error","ts":1698609744.3038561,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
- spegel-ck69r › registry
+ spegel-ck69r › registry
spegel-ck69r registry {"level":"error","ts":1698609775.4334285,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
- spegel-ck69r › registry
+ spegel-vtkld › registry
spegel-vtkld registry {"level":"error","ts":1698609821.6245556,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
- spegel-vtkld › registry
+ spegel-ck69r › registry
spegel-ck69r registry {"level":"error","ts":1698609905.353383,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
- spegel-ck69r › registry

My socket path seems correct /run/containerd/containerd.sock.

I have configured the containerd config_path and discard_unpacked_layers = false

Any ideas why the pods are CLBOd?

Thank you.

@phillebaba
Copy link
Member

@shanmugara what too are you using to setup Kubernetes? It sounds like either the socket path has an issue or Containerd is missing some configuration.

@shanmugara
Copy link

@phillebaba Thx for the response. my cluster is built with kubeadm. I was able to isolate the issue to just ubuntu bionic. I have another cluster built with kubeadm also but running ubuntu focal. That seems to be working fine. If it is a bionic specific issue, it's ok, you can ignore it.

@shanmugara
Copy link

@phillebaba In a scenario where a daemonset is starting with 20 replicas simultaneously, assuming the image is not yet downloaded, will all 20 nodes download the image simultaneously?

@phillebaba
Copy link
Member

@shanmugara could you create a separate issue for the issues you are seeing with Kubeadm as it is not related to k3s. If you could share the steps that you use to setup you cluster I will see if I am able to reproduces it.

If you have any questions about performance please create a separate issue for that also.

@ElectroshockGuy
Copy link

Hello,
I also encountered the same problem in k3s.
In my k3s cluster, I configured in /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl.

[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/var/lib/rancher/k3s/agent/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true
  discard_unpacked_layers = false

spegel is effective during the initContainer phase.

ls /var/lib/rancher/k3s/agent/etc/containerd/certs.d/ -al
total 48
drwxr-xr-x 12 root root 4096 Nov 20 23:14 .
drwxr-xr-x  3 root root 4096 Nov 20 23:30 ..
drwxr-xr-x  2 root root 4096 Nov 20 23:14 docker.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 gcr.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 ghcr.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 iboot:5000
drwxr-xr-x  2 root root 4096 Nov 20 23:14 k8s.gcr.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 lscr.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 mcr.microsoft.com
drwxr-xr-x  2 root root 4096 Nov 20 23:14 public.ecr.aws
drwxr-xr-x  2 root root 4096 Nov 20 23:14 quay.io
drwxr-xr-x  2 root root 4096 Nov 20 23:14 registry.k8s.io

ctr pull command, I specified --host-dir and confirmed that it is valid.

# ctr i pull --hosts-dir "/var/lib/rancher/k3s/agent/etc/containerd/certs.d" docker.io/library/nginx:latest
INFO[0000] trying next host                              error="failed to do request: Head \"http://127.0.0.1:30020/v2/library/nginx/manifests/latest?ns=docker.io\": dial tcp 127.0.0.1:30020: connect: connection refused" host="127.0.0.1:30020"
INFO[0000] trying next host                              error="failed to do request: Head \"http://127.0.0.1:30021/v2/library/nginx/manifests/latest?ns=docker.io\": dial tcp 127.0.0.1:30021: connect: connection refused" host="127.0.0.1:30021"

but, spegel is not running properly.

# kubectl -n spegel get pods
NAME           READY   STATUS             RESTARTS        AGE
spegel-6vcg5   0/1     CrashLoopBackOff   7 (4m8s ago)    17m
spegel-tjtnn   0/1     CrashLoopBackOff   7 (3m14s ago)   17m

pod log

"Containerd registry config path needs to be set for mirror configuration to take effect"

k3s-service log

time="2023-11-20T23:50:38+08:00" level=info msg="Waiting for containerd startup: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"

containerd log

time="2023-11-20T23:28:55.906708738+08:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2023-11-20T23:28:55.906748874+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2023-11-20T23:28:55.906794668+08:00" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
time="2023-11-20T23:28:55.906898547+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
time="2023-11-20T23:28:55.907708354+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `mirrors` cannot be set when `config_path` is provided"
time="2023-11-20T23:28:55.908095214+08:00" level=info msg=serving... address=/run/k3s/containerd/containerd.sock.ttrpc
time="2023-11-20T23:28:55.908205222+08:00" level=info msg=serving... address=/run/k3s/containerd/containerd.sock
time="2023-11-20T23:28:55.908248662+08:00" level=info msg="containerd successfully booted in 0.818316s"
time="2023-11-20T23:29:00.304331557+08:00" level=error msg="evicting /tasks/exit from queue because of retry count"
time="2023-11-20T23:29:00.305559118+08:00" level=error msg="evicting /tasks/exit from queue because of retry count"

@ElectroshockGuy
Copy link

After carefully reviewing the following containerd-related errors, I deleted the configurations related to "mirror" in /etc/rancher/k3s/registries.yaml and /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl. Now, spegel is working properly.

time="2023-11-20T23:28:55.907708354+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `mirrors` cannot be set when `config_path` is provided"

@lukaszraczylo
Copy link
Contributor

Solution which worked:

  1. Take content of /var/lib/rancher/k3s/agent/etc/containerd/config.toml
  2. Create /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl and add:
[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/k3s/agent/etc/containerd/certs.d"
  1. Restart k3s-agent ( on worker nodes ) or k3s ( on master nodes )

@onedr0p
Copy link
Contributor

onedr0p commented Nov 28, 2023

Thanks everyone! I was able to get it working on k3s 🎉

After carefully reviewing the following containerd-related errors, I deleted the configurations related to "mirror" in /etc/rancher/k3s/registries.yaml and /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl. Now, spegel is working properly.

time="2023-11-20T23:28:55.907708354+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `mirrors` cannot be set when `config_path` is provided"

That's a bit unfortunate since the docs state:

Spegel does not aim to replace projects like Harbor or Zot but instead complements them.

I was hoping to use spegel in my cluster and zot as a pull thru cache deployed elsewhere. For what it's worth this is my current containerd mirrors:

mirrors:
  docker.io:
    endpoint:
      - https://zot.domain.tld/v2/docker.io
  ghcr.io:
    endpoint:
      - https://zot.domain.tld/v2/ghcr.io
  quay.io:
    endpoint:
      - https://zot.domain.tld/v2/quay.io
  gcr.io:
    endpoint:
      - https://zot.domain.tld/v2/gcr.io
  registry.k8s.io:
    endpoint:
      - https://zot.domain.tld/v2/registry.k8s.io
  public.ecr.aws:
    endpoint:
      - https://zot.domain.tld/v2/public.ecr.aws

I don't see a way to have spegel take over this responsibility it seems like you either have spegel or a pull thru cache, maybe this can be a feature request?

@fmunteanu
Copy link

@phillebaba I see k3s-io/k3s#8977 already merged, but cannot find any related documentation, is this because the last item not checked?

@onedr0p
Copy link
Contributor

onedr0p commented Jan 28, 2024

@phillebaba I see k3s-io/k3s#8977 already merged, but cannot find any related documentation, is this because the last item not checked?

@fmunteanu It looks like it we be released as a experimental option in k3s 1.29.1

@phillebaba
Copy link
Member

@fmunteanu the documentation has been published but we are still waiting for a GA release of k3s with Spegel.

https://docs.k3s.io/installation/registry-mirror?_highlight=spegel

As soon as we get a release I will update the compatibility docs here and point to the k3s docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants