Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluto service takes 5 minutes to commit settings #267

Open
Tsonov opened this issue Nov 15, 2024 · 0 comments
Open

Pluto service takes 5 minutes to commit settings #267

Tsonov opened this issue Nov 15, 2024 · 0 comments

Comments

@Tsonov
Copy link

Tsonov commented Nov 15, 2024

Making this bug report because we observe slow init time for bottlerocket EKS nodes. Nodes get stuck in ~5 minutes before starting kubelet and joining the cluster. So far we pinpointed the source to slow pluto.service commit stage (which seems to come from this repo, correct?).

The clusters are using the latest EKS-optimized Bottlerocket image. It reproduces consistently on every new node but not for every cluster.

The question is how to investigate and fix the cause of this? We are not sure if this is a package issue or configuration issue in the clusters. The clusters have IDMS enabled. Not sure what else is required for this process.

Package I'm using:
pluto.service

What I expected to happen:
Startup to take 1-2 minutes and not 5+ minutes.

What actually happened:
Looking at systemd logs, pluto.service took 5 minutes to complete. We extracted logs from it and we observe the Committing settings step taking 5 minutes.

Logs from pluto:

bash-5.0# journalctl -u pluto.service
Nov 14 07:42:00 localhost systemd[1]: Starting Generate additional settings for Kubernetes...
Nov 14 07:42:00 localhost settings-committer[1832]: 07:42:00 [INFO] Checking pending settings.
Nov 14 07:42:00 localhost settings-committer[1832]: 07:42:00 [INFO] Committing settings.
Nov 14 07:47:01 localhost systemd[1]: Finished Generate additional settings for Kubernetes.

How to reproduce the problem:
Unclear, we only see this issue in some customer clusters but not on a fresh cluster.

** Extra information **
bash-5.0# apiclient get os
{
"os": {
"arch": "x86_64",
"build_id": "360b7a38",
"pretty_name": "Bottlerocket OS 1.26.2 (aws-k8s-1.30)",
"variant_id": "aws-k8s-1.30",
"version_id": "1.26.2"
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant