feat(controller): Add support for Agent tolerations and affinities #1085

Dhouti · 2025-11-05T05:39:22Z

Agent configurations lack scheduling knobs.

Adds Tolerations, Affinity, & NodeSelector options to the Agent CRD.
Example:

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: example
  namespace: test
spec:
    deployment:
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
      - operator: Exists
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux

Copilot

Pull Request Overview

This PR adds support for Kubernetes scheduling attributes (tolerations, affinity, and nodeSelector) to agent deployments in the kagent system. These attributes allow users to control pod placement on nodes based on constraints, preferences, and taints/tolerations.

Key changes:

Added three new optional fields to SharedDeploymentSpec: Tolerations, Affinity, and NodeSelector
Updated the translator to propagate these scheduling attributes from the API spec to the Kubernetes Deployment manifest
Generated CRD updates to include the new scheduling fields with full Kubernetes API definitions

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`go/api/v1alpha2/agent_types.go`	Added three optional scheduling fields to `SharedDeploymentSpec` struct
`go/api/v1alpha2/zz_generated.deepcopy.go`	Auto-generated deep copy methods for the new scheduling fields
`go/internal/controller/translator/agent/adk_api_translator.go`	Updated translator to copy scheduling attributes from spec to resolved deployment and apply them to the pod template
`go/config/crd/bases/kagent.dev_agents.yaml`	Auto-generated CRD schema updates with full Kubernetes scheduling API definitions
`go/go.mod`	Moved two dependencies from indirect to direct
`go/go.sum`	Added dependency entry for `mattn/go-runewidth`
`go/internal/controller/translator/agent/testdata/inputs/agent_with_scheduling_attributes.yaml`	New test input with scheduling attributes configuration
`go/internal/controller/translator/agent/testdata/outputs/agent_with_scheduling_attributes.json`	Expected output showing scheduling attributes applied to deployment manifest

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Dhouti <l.rex.via@gmail.com>

EItanya

I initially got scared by the sheer size of this PR, but after looking at it I realized it's actually quite simple and clean! Thanks so much for the PR :)

note: unsure where the affinity template updates came from, but they get generated with the gen makefile target. maybe from #1085, but surprised it's generating on my pr 3 weeks after the merge 🤔 # Changes - Hashing secret alongside config-hash annotation for agent pod, so when a referenced secret updates, it restarts - Added `SecretHash` status on ModelConfig so that changes to underlying referenced secrets are propagated (resource version updates) to Agent reconciliation <img width="2067" height="1464" alt="image" src="https://github.com/user-attachments/assets/a1b74d88-17f8-45fd-b334-cc1f2553a47f" /> With these changes… 1. When a Secret updates, a ModelConfig will update its status to reflect new hash. 2. ModelConfig updates resource version 3. The agent watching over modelConfig sees resource update 4. Agent reconciles, updating the annotation on the pod. 5. Agent restarts, loading in new secrets ## Golden Test Changes - Notes The outputs for golden test annotations have _not_ changed, because the annotation hash relies on the modelconfig status which has Secret updates (hash). Modelconfig needs to reconcile for status, and does not reconcile in test, so `byte{}` (no change) is written to the hash. # Context With the addition of TLS CA’s to ModelConfigs, it became apparent we’ll need a UX-friendly way for agents to update with the latest Secret (e.g. cert rotation, api key change) without requiring users to manually restart the agent. Note: We can’t rely on dynamic volume mounting, as the ca cert is read on agent start so that it configures the cached client. The api key also needed a way for its update to propagate to the agent. ## Demo _steps_ [agent restart validation steps.md](https://github.com/user-attachments/files/23664735/agent.restart.validation.steps.md) _video_ https://github.com/user-attachments/assets/eca62fb4-2ca2-45eb-94ba-7dfd0db5244b ## Alternative Solutions _feedback wanted_ ### Per-Secret Status Instead of hashing all secrets into a single final hash to store in the ModelConfig’s status, we could store a status per-Secret. For example, the status would change from: ```yaml status: […] SecretHash: XYZ ``` to something like ```yaml status: […] Secrets: APIKey: Hash/Version: 123 TLS: Hash/Version: 123 ``` I avoided this in order to simplify status tracking, less wordy compared to adding a field-per-secret - especially if we expand on referenced secrets in the future. But this manner does provide a better way for users to track where changes occurred exactly, and could avoid needing to do any hashing by using each secret’s resource version for updates. We would need to see _how_ we’d propagate this to the agent pod annotations: adding annotation-per-secret vs. doing a singular hash for the pod like we do for the status now. ### Avoiding Restart Requirement We should be able to avoid the restart needed for agents to configure the secrets. For instance, right now we mount a Volume for the TLS CA, and we use its file to configure the client at start which is cached. We could remove the client caching so that updated data from volume mounts are caught and used. Pros: - Avoiding restart requirement Cons: - Not caching the client would have some performance impact as it would need to be recreated per-call (maybe not a big deal, but noteworthy) - We won’t be able to do any validation checks like we do now on startup. --- Resolves #1091 --------- Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>

note: unsure where the affinity template updates came from, but they get generated with the gen makefile target. maybe from kagent-dev#1085, but surprised it's generating on my pr 3 weeks after the merge 🤔 # Changes - Hashing secret alongside config-hash annotation for agent pod, so when a referenced secret updates, it restarts - Added `SecretHash` status on ModelConfig so that changes to underlying referenced secrets are propagated (resource version updates) to Agent reconciliation <img width="2067" height="1464" alt="image" src="https://github.com/user-attachments/assets/a1b74d88-17f8-45fd-b334-cc1f2553a47f" /> With these changes… 1. When a Secret updates, a ModelConfig will update its status to reflect new hash. 2. ModelConfig updates resource version 3. The agent watching over modelConfig sees resource update 4. Agent reconciles, updating the annotation on the pod. 5. Agent restarts, loading in new secrets ## Golden Test Changes - Notes The outputs for golden test annotations have _not_ changed, because the annotation hash relies on the modelconfig status which has Secret updates (hash). Modelconfig needs to reconcile for status, and does not reconcile in test, so `byte{}` (no change) is written to the hash. # Context With the addition of TLS CA’s to ModelConfigs, it became apparent we’ll need a UX-friendly way for agents to update with the latest Secret (e.g. cert rotation, api key change) without requiring users to manually restart the agent. Note: We can’t rely on dynamic volume mounting, as the ca cert is read on agent start so that it configures the cached client. The api key also needed a way for its update to propagate to the agent. ## Demo _steps_ [agent restart validation steps.md](https://github.com/user-attachments/files/23664735/agent.restart.validation.steps.md) _video_ https://github.com/user-attachments/assets/eca62fb4-2ca2-45eb-94ba-7dfd0db5244b ## Alternative Solutions _feedback wanted_ ### Per-Secret Status Instead of hashing all secrets into a single final hash to store in the ModelConfig’s status, we could store a status per-Secret. For example, the status would change from: ```yaml status: […] SecretHash: XYZ ``` to something like ```yaml status: […] Secrets: APIKey: Hash/Version: 123 TLS: Hash/Version: 123 ``` I avoided this in order to simplify status tracking, less wordy compared to adding a field-per-secret - especially if we expand on referenced secrets in the future. But this manner does provide a better way for users to track where changes occurred exactly, and could avoid needing to do any hashing by using each secret’s resource version for updates. We would need to see _how_ we’d propagate this to the agent pod annotations: adding annotation-per-secret vs. doing a singular hash for the pod like we do for the status now. ### Avoiding Restart Requirement We should be able to avoid the restart needed for agents to configure the secrets. For instance, right now we mount a Volume for the TLS CA, and we use its file to configure the client at start which is cached. We could remove the client caching so that updated data from volume mounts are caught and used. Pros: - Avoiding restart requirement Cons: - Not caching the client would have some performance impact as it would need to be recreated per-call (maybe not a big deal, but noteworthy) - We won’t be able to do any validation checks like we do now on startup. --- Resolves kagent-dev#1091 --------- Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Copilot AI review requested due to automatic review settings November 5, 2025 05:39

Dhouti requested review from EItanya, ilackarms and yuval-k as code owners November 5, 2025 05:39

Copilot AI reviewed Nov 5, 2025

View reviewed changes

feat(controller): Add support for Agent tolerations and affinities

b301aba

Signed-off-by: Dhouti <l.rex.via@gmail.com>

Dhouti force-pushed the feature/controller-agent-tolerations-and-affinities branch from dc3a8ad to b301aba Compare November 5, 2025 05:57

EItanya approved these changes Nov 6, 2025

View reviewed changes

EItanya merged commit 7ecca0f into kagent-dev:main Nov 6, 2025
16 checks passed

inFocus7 mentioned this pull request Nov 24, 2025

Restart Agents Automatically On Secret Updates #1121

Merged

BrewTestBot mentioned this pull request Nov 25, 2025

kagent 0.7.5 Homebrew/homebrew-core#255924

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(controller): Add support for Agent tolerations and affinities #1085

feat(controller): Add support for Agent tolerations and affinities #1085

Uh oh!

Dhouti commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

EItanya left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(controller): Add support for Agent tolerations and affinities #1085

feat(controller): Add support for Agent tolerations and affinities #1085

Uh oh!

Conversation

Dhouti commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

EItanya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants