Skip to content

Conversation

@RotemK1
Copy link

@RotemK1 RotemK1 commented Dec 22, 2025

PR Description Template

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes two critical issues in the VPA Helm chart that prevent the admission controller from functioning correctly:

  1. Missing RBAC Permissions: The ClusterRole for the admission controller was missing several required permissions:

    • create, update, delete verbs for mutatingwebhookconfigurations (needed for self-registration)
    • Permissions to watch workload resources: deployments, statefulsets, replicasets, daemonsets (apps API group)
    • Permissions to watch batch workloads: jobs, cronjobs (batch API group)
    • Permissions to watch legacy workloads: replicationcontrollers (core API group)

    Without these permissions, the admission controller fails with RBAC errors and cannot watch the resources it needs to provide recommendations.

  2. TLS Certificate File Naming: The Helm chart creates TLS certificates with standard Kubernetes secret key names (ca.crt, tls.crt, tls.key), but the VPA admission controller binary expects specific file names (caCert.pem, serverCert.pem, serverKey.pem). This mismatch causes the admission controller to fail to start with certificate file not found errors.

Impact: These issues cause the VPA admission controller to fail during startup, preventing VPA from functioning properly. Users experience RBAC permission denied errors and TLS certificate file not found errors in the admission controller logs.

Environment Details:

  • Kubernetes Version: 1.32.9-eks-ecaa3a6 (EKS 1.32)
  • VPA Chart Version: 0.7.0
  • VPA Admission Controller Image: registry.k8s.io/autoscaling/vpa-admission-controller:1.5.1
  • Helm Version: v3.16.1
  • kubectl Version: v1.34.3

Error Logs Observed:

  1. RBAC Permission Errors (before fix):
E1222 12:43:36.937195       1 reflector.go:205] "Failed to watch" err="failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope"
E1222 12:44:00.137421       1 reflector.go:205] "Failed to watch" err="failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"replicationcontrollers\" in API group \"\" at the cluster scope"
E1222 12:44:07.169424       1 reflector.go:205] "Failed to watch" err="failed to list *v1.Deployment: deployments.apps is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"deployments\" in API group \"apps\" at the cluster scope"
E1222 12:44:18.685109       1 reflector.go:205] "Failed to watch" err="failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"replicasets\" in API group \"apps\" at the cluster scope"
E1222 12:44:22.785352       1 reflector.go:205] "Failed to watch" err="failed to list *v1.CronJob: cronjobs.batch is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"cronjobs\" in API group \"batch\" at the cluster scope"
E1222 12:44:23.460402       1 reflector.go:205] "Failed to watch" err="failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope"
E1222 12:44:26.050336       1 reflector.go:205] "Failed to watch" err="failed to list *v1.DaemonSet: daemonsets.apps is forbidden: User \"system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller\" cannot list resource \"daemonsets\" in API group \"apps\" at the cluster scope"
F1222 12:49:42.371263       1 config.go:190] mutatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User "system:serviceaccount:infra:vertical-pod-autoscaler-admission-controller" cannot create resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope
  1. TLS Certificate File Not Found Errors (before fix):
F1222 12:46:53.059923       1 config.go:89] open /etc/tls-certs/serverCert.pem: no such file or directory
E1222 12:48:58.720778       1 certs.go:42] "Error reading certificate file" err="open /etc/tls-certs/caCert.pem: no such file or directory" file="/etc/tls-certs/caCert.pem"

After applying fixes, the admission controller starts successfully:

I1222 12:49:32.361011       1 certs.go:45] "Successfully read bytes from file" bytes=1224 file="/etc/tls-certs/caCert.pem"
I1222 12:50:03.393863       1 config.go:192] Self registration as MutatingWebhook succeeded.

Which issue(s) this PR fixes:

Fixes #8938

Special notes for your reviewer:

  • The RBAC permissions added are based on the actual requirements of VPA admission controller v1.5.1
  • The TLS certificate naming change aligns with what the VPA admission controller binary expects (as seen in the source code and runtime behavior)
  • Both changes are backward compatible - existing installations will continue to work, but new installations will have the correct configuration
  • Tested on EKS 1.32 cluster with Helm chart version 0.7.0
  • Verified that after applying these fixes, the admission controller pods start successfully and the mutating webhook is registered correctly

Does this PR introduce a user-facing change?

Fix VPA Helm chart admission controller RBAC permissions and TLS certificate naming. The admission controller now has the required permissions to watch workloads and self-register as a mutating webhook, and TLS certificates use the correct file names expected by the VPA binary.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Dec 22, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 22, 2025

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: RotemK1 / name: RotemK1 (2d2ed7a)

@k8s-ci-robot
Copy link
Contributor

Welcome @RotemK1!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 22, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @RotemK1. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. area/vertical-pod-autoscaler labels Dec 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RotemK1
Once this PR has been reviewed and has the lgtm label, please assign adrianmoisey for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-area cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 22, 2025
@omerap12
Copy link
Member

Thanks for openning this, but all of this will be solved once #8870 merges.

/close

@k8s-ci-robot
Copy link
Contributor

@omerap12: Closed this PR.

Details

In response to this:

Thanks for openning this, but all of this will be solved once #8870 merges.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The VPA Helm chart is missing the proper RBAC permissions for the admission controller deployment

3 participants