fix(rbac): Add missing agentruntimes permissions to ClusterRole#253
fix(rbac): Add missing agentruntimes permissions to ClusterRole#253
Conversation
AgentRuntimeReconciler has been deployed in production without the necessary RBAC permissions, causing continuous permission errors in operator logs. ## Problem The operator's ServiceAccount cannot list/watch AgentRuntime CRDs: ``` agentruntimes.agent.kagenti.dev is forbidden: User "system:serviceaccount:kagenti-operator-system:controller-manager" cannot list resource "agentruntimes" in API group "agent.kagenti.dev" at the cluster scope ``` This error repeats continuously with exponential backoff, filling logs and preventing AgentRuntime reconciliation. ## Root Cause 1. AgentRuntimeReconciler is always registered (cmd/main.go:323-330) 2. Controller declares required RBAC in code annotations: ```go // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes,verbs=get;list;watch;create;update;patch;delete // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes/status,verbs=get;update;patch // +kubebuilder:rbac:groups=agent.kagenti.dev,resources=agentruntimes/finalizers,verbs=update ``` 3. Helm chart ClusterRole template is missing these permissions ## Solution Add agentruntimes permissions to charts/kagenti-operator/templates/rbac/role.yaml matching the kubebuilder RBAC annotations in agentruntime_controller.go. ## Impact - Fixes permission errors in operator logs - Enables AgentRuntime controller to function correctly - Allows per-workload identity/observability configuration ## Testing Deployed operator with fix in kind cluster: - Permission errors stopped immediately - AgentRuntime controller can now list/watch CRDs - No regressions in other controllers Fixes a pre-existing bug affecting all deployments. Signed-off-by: Alan Cha <Alan.cha1@ibm.com>
Full Error Logs from OperatorThese errors repeat continuously in the kagenti-operator logs (exponential backoff): How to reproduce:
Verification after fix:
|
cwiklik
left a comment
There was a problem hiding this comment.
Review Summary
Correct fix for the missing agentruntimes RBAC permissions — the added rules match the kubebuilder markers exactly and the PR description is thorough with clear root cause analysis.
Overlap with #249: PR #249 (already approved) does a comprehensive alignment of this same file, adding agentruntimes (same fix) and removing 79 lines of over-provisioned rules (secrets, CRDs, webhooks, RBAC management, deprecated extensions API group). These two PRs will have merge conflicts on charts/kagenti-operator/templates/rbac/role.yaml. Recommend coordinating merge order:
Areas reviewed: Helm/K8s RBAC
Commits: 1 commit, signed-off: yes
CI status: All 14 checks passing (including E2E)
| verbs: | ||
| - create | ||
| - delete | ||
| - get |
There was a problem hiding this comment.
suggestion (coordination): This exact change is also included in PR #249, which does a broader RBAC cleanup aligning the entire Helm ClusterRole with config/rbac/role.yaml. PR #249 adds agentruntimes (same rules as here) plus removes ~79 lines of over-provisioned permissions the operator doesn't use (secrets, CRDs, webhooks, RBAC, deprecated extensions API group, etc.).
These two PRs will conflict on this file. If #249 merges first, this PR is fully superseded. Worth coordinating merge order with @ChristianZaccaria.
There was a problem hiding this comment.
Perhaps it makes sense to merge #249 as the changes are more extensive. We can close this after that one is merged.
There was a problem hiding this comment.
I believe #249 is ready for merge.
Severity ClarificationThis bug completely breaks the AgentRuntime feature, which is documented as "the declarative way to enroll a workload into the Kagenti platform" (docs/architecture.md). Impact on UsersUsers creating AgentRuntime resources expecting agent enrollment will see:
Workaround: Users must manually add Root CauseAgentRuntime was added in commit RecommendationThis should be treated as a P0 bug fix for the AgentRuntime feature. All |
Problem
The kagenti-operator is deployed with insufficient RBAC permissions, causing continuous errors in production:
Logs showing the issue:
This error repeats continuously (exponential backoff), filling operator logs and preventing the AgentRuntimeReconciler from functioning.
Root Cause
Mismatch between code and Helm chart:
Code declares required RBAC in internal/controller/agentruntime_controller.go:66-69:
Controller is always registered in cmd/main.go:323-330
Helm chart ClusterRole is missing these permissions in charts/kagenti-operator/templates/rbac/role.yaml
Solution
Add the missing
agentruntimespermissions to the ClusterRole Helm template to match the controller's kubebuilder RBAC annotations.Changes:
agentruntimesresource permissions (get, list, watch, create, update, patch, delete)agentruntimes/statussubresource permissions (get, update, patch)agentruntimes/finalizerssubresource permissions (update)Impact
Before fix:
After fix:
Testing
Deployed operator with this fix in kind cluster:
Type of Change
Checklist
Related Issues
This is a pre-existing bug affecting all deployments of kagenti-operator. No specific issue filed, discovered during E2E testing of PR #247.