Skip to content

🌱 Avoid large number of connection error traces in kubeadm controlplane controller #12106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dmvolod
Copy link
Member

@dmvolod dmvolod commented Apr 16, 2025

What this PR does / why we need it:
This small fix removes large number of stack traces of the workload cluster connection error while it's not ready. The logs spams with predictable stack trace on each reconcile loop with full stack trace instead connection is not ready info.

2025-04-16T19:50:59+03:00	INFO	Reconcile KubeadmControlPlane	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}}
2025-04-16T19:50:59+03:00	INFO	Scaling up control plane	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "desired": 3, "existing": 1}
2025-04-16T19:50:59+03:00	INFO	Waiting for control plane to pass preflight checks	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "failures": "Machine test-mgmt-control-plane-pqh7s does not have a corresponding Node yet (Machine.status.nodeRef not set)"}
2025-04-16T19:50:59+03:00	DEBUG	events	Waiting for control plane to pass preflight checks to continue reconciliation: Machine test-mgmt-control-plane-pqh7s does not have a corresponding Node yet (Machine.status.nodeRef not set)	{"type": "Warning", "object": {"kind":"KubeadmControlPlane","namespace":"default","name":"test-mgmt-control-plane","uid":"06538869-e95e-40e4-8a90-f602672391e5","apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","resourceVersion":"742"}, "reason": "ControlPlaneUnhealthy"}
2025-04-16T19:50:59+03:00	ERROR	Could not connect to workload cluster to fetch status	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "error": "failed to create remote cluster client: default/test-mgmt: failed to get REST config: failed to create cluster accessor: error creating http client and mapper for remote cluster \"default/test-mgmt\": error creating client for remote cluster \"default/test-mgmt\": cluster is not reachable: Get \"https://10.0.180.10:6443/?timeout=5s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "errorVerbose": "default/test-mgmt: failed to get REST config: failed to create cluster accessor: error creating http client and mapper for remote cluster \"default/test-mgmt\": error creating client for remote cluster \"default/test-mgmt\": cluster is not reachable: Get \"https://10.0.180.10:6443/?timeout=5s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")\nfailed to create remote cluster client\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).updateStatus\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.11/controlplane/kubeadm/internal/controllers/status.go:89\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile.func1\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.11/controlplane/kubeadm/internal/controllers/controller.go:206\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.11/controlplane/kubeadm/internal/controllers/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/home/dvolodin/sdk/go1.23.5/src/runtime/asm_amd64.s:1700"}
sigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile.func1
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.11/controlplane/kubeadm/internal/controllers/controller.go:209
sigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/cluster-api@v1.8.11/controlplane/kubeadm/internal/controllers/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.5/pkg/internal/controller/controller.go:222

Another fix can be implemented with validation controlPlane.Cluster.Status.InfrastructureReady before connection to avoid connection problems and large number of noisy stack traces.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area control-plane

@k8s-ci-robot k8s-ci-robot added area/control-plane Issues or PRs related to control-plane lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 16, 2025
Copy link
Member

@sivchari sivchari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left nits, otherwise LGTM

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 12, 2025
@sbueringer
Copy link
Member

@dmvolod Thank you, very nice catch! Change is in general fine, but please rebase on top of main we now basically have this log statement twice and we should adjust both of them

@dmvolod dmvolod force-pushed the kubeadm-controller-remove-noizy-trace branch from c701a6d to 51de879 Compare June 12, 2025 14:39
@dmvolod
Copy link
Member Author

dmvolod commented Jun 12, 2025

@dmvolod Thank you, very nice catch! Change is in general fine, but please rebase on top of main we now basically have this log statement twice and we should adjust both of them

Yes, sure, fixed. Please review

Copy link
Member

@sivchari sivchari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 228873c227f0788932426efda8daa498b3edec35

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
@k8s-ci-robot k8s-ci-robot requested a review from sivchari June 12, 2025 16:57
@dmvolod dmvolod force-pushed the kubeadm-controller-remove-noizy-trace branch from fbcd203 to a377e26 Compare June 12, 2025 16:58
@sbueringer
Copy link
Member

Thx!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a02300f1778f0257653f0120b16fe1fa3af092ac

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2a5ee95 into kubernetes-sigs:main Jun 13, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.11 milestone Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane Issues or PRs related to control-plane lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants