Linkerd agent #1157

GTRekter · 2025-12-04T14:53:44Z

First version of the Linkerd OSS Agent. It enables users to inject Linkerd proxies and use Linkerd CLI subcommands to inspect certificates, and check control-plane and data-plane health.

Diagnostics commands are included to simplify troubleshooting of policies, endpoints, and profiles.

Tools PR: kagent-dev/tools#31

Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Needed to manually copy/paste some HTML to get some older data to show what that looks like :) <img width="517" height="673" alt="image" src="https://github.com/user-attachments/assets/b590058d-624a-443f-b818-14989ede9e7d" /> Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

note: unsure where the affinity template updates came from, but they get generated with the gen makefile target. maybe from kagent-dev#1085, but surprised it's generating on my pr 3 weeks after the merge 🤔 # Changes - Hashing secret alongside config-hash annotation for agent pod, so when a referenced secret updates, it restarts - Added `SecretHash` status on ModelConfig so that changes to underlying referenced secrets are propagated (resource version updates) to Agent reconciliation <img width="2067" height="1464" alt="image" src="https://github.com/user-attachments/assets/a1b74d88-17f8-45fd-b334-cc1f2553a47f" /> With these changes… 1. When a Secret updates, a ModelConfig will update its status to reflect new hash. 2. ModelConfig updates resource version 3. The agent watching over modelConfig sees resource update 4. Agent reconciles, updating the annotation on the pod. 5. Agent restarts, loading in new secrets ## Golden Test Changes - Notes The outputs for golden test annotations have _not_ changed, because the annotation hash relies on the modelconfig status which has Secret updates (hash). Modelconfig needs to reconcile for status, and does not reconcile in test, so `byte{}` (no change) is written to the hash. # Context With the addition of TLS CA’s to ModelConfigs, it became apparent we’ll need a UX-friendly way for agents to update with the latest Secret (e.g. cert rotation, api key change) without requiring users to manually restart the agent. Note: We can’t rely on dynamic volume mounting, as the ca cert is read on agent start so that it configures the cached client. The api key also needed a way for its update to propagate to the agent. ## Demo _steps_ [agent restart validation steps.md](https://github.com/user-attachments/files/23664735/agent.restart.validation.steps.md) _video_ https://github.com/user-attachments/assets/eca62fb4-2ca2-45eb-94ba-7dfd0db5244b ## Alternative Solutions _feedback wanted_ ### Per-Secret Status Instead of hashing all secrets into a single final hash to store in the ModelConfig’s status, we could store a status per-Secret. For example, the status would change from: ```yaml status: […] SecretHash: XYZ ``` to something like ```yaml status: […] Secrets: APIKey: Hash/Version: 123 TLS: Hash/Version: 123 ``` I avoided this in order to simplify status tracking, less wordy compared to adding a field-per-secret - especially if we expand on referenced secrets in the future. But this manner does provide a better way for users to track where changes occurred exactly, and could avoid needing to do any hashing by using each secret’s resource version for updates. We would need to see _how_ we’d propagate this to the agent pod annotations: adding annotation-per-secret vs. doing a singular hash for the pod like we do for the status now. ### Avoiding Restart Requirement We should be able to avoid the restart needed for agents to configure the secrets. For instance, right now we mount a Volume for the TLS CA, and we use its file to configure the client at start which is cached. We could remove the client caching so that updated data from volume mounts are caught and used. Pros: - Avoiding restart requirement Cons: - Not caching the client would have some performance impact as it would need to be recreated per-call (maybe not a big deal, but noteworthy) - We won’t be able to do any validation checks like we do now on startup. --- Resolves kagent-dev#1091 --------- Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…ent-dev#1137) Split this out of kagent-dev#1133 to try reduce the size of that PR - but also because it's not strictly related to being able to scale the controller - it simply manifested when needing to switch to postgres when running multiple controller replicas. Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…-dev#1140) Another artifact of kagent-dev#1133. No need for the sqlite volume+mount when database is set to postgres. Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…te database (kagent-dev#1144) Running multiple controller replicas when using a local SQLite database will lead to errors as API requests will inevitably end up being handled by a replicas that does not have the local state (e.g. A2A session). This check/error hopefully prevents users from making this mistake. Split out from kagent-dev#1133 Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Enables local testing using postgres as a backing store for controller. Split out from kagent-dev#1133 (with added docs). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io> Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

**Yet another PR split out from kagent-dev#1133 to try reduce review burden** - keeping that one open for now as all of these other PRs are ultimately working towards that goal. This PR refactors the kagent controller to support the use of environment variables for configuration in addition to command-line arguments. It also updates the Helm chart to make use of env vars instead of command line args and adds the ability for user's to supply their own environment variables with custom configuration. This allows users to supply sensitive configuration (e.g. postgres database url) via secrets instead of exposing these via `args`. Env vars are also easier to patch when working with rendered manifests if needed. --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…n guidelines, update README (kagent-dev#1142) Expand the internal documentation for users to participate in the project. --------- Signed-off-by: Sam Heilbron <samheilbron@gmail.com> Signed-off-by: Sam Heilbron <SamHeilbron@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

This PR enables leader election on the controller if it is configured with one than 1 replica to ensure that only 1 replica is actively reconciling watched manifests. It also ensures that the necessary RBAC manifests are created. Final part of kagent-dev#1133 (excluding kagent-dev#1138). --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

…econcilation (kagent-dev#1138) **Decided to split this out of kagent-dev#1133 to try make review a little easier as it's a chunky commit that can live in isolation of the rest of the changes in that PR** This change separates A2A handler registration from the main `Agent` controller reconciliation loop by introducing a dedicated `A2ARegistrar` that manages the A2A routing table independently from the main controller. Currently, A2A handler registration is tightly coupled to the `Agent` controller's reconciliation loop, which performs the following operations: 1. Reconcile Kubernetes resources (Deployment, Service, etc.) 2. Store agent metadata in database 3. Register A2A handler in routing table 4. Update resource status This coupling is problematic for a number of reasons: 1. Breaks horizontal scaling - with leader election enabled (required to prevent duplicate reconciliation), only the leader pod performs reconciliation and registers A2A handlers. When API requests hit non-leader replicas, they fail because those replicas lack the necessary handler registrations. 2. Could be argued that this violates separation of concerns - the controller handles both cluster resource management (its core responsibility) and API routing configuration (an orthogonal concern). 3. Makes future architectural changes (e.g., splitting API and control plane) unnecessarily complex. This PR attempts to address those concerns ensuring that all controller replicas, when scaled, will maintain consistent A2A routing tables enabling transparent load balancing across replicas. A2A logic is also consolidated into a dedicated package rather than scattered across controller code ensuring a clean separation of API and control plane such that these could be split into independent deployments without significant refactoring in future. --------- Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

) Signed-off-by: jiangdong <jiangdong@iflytek.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Signed-off-by: jiangdong <jiangdong@iflytek.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Signed-off-by: Ivan (이반) Porta <porta.ivan@outlook.com>

peterj · 2025-12-05T18:36:01Z

Makefile

 	helm $(HELM_ACTION) kagent helm/kagent \
 		--namespace kagent \
 		--create-namespace \
-		--history-max 2    \


any reason for removing this?

…to linkerd-agent

peterj · 2025-12-09T01:12:34Z

helm/agents/linkerd/templates/agent.yaml

+          kind: RemoteMCPServer
+          apiGroup: kagent.dev
+          toolNames:
+          - k8s_create_resource


there's a lot of tools here (we recommend <20) -- is there anthing that could be removed?

EItanya · 2025-12-10T19:23:32Z

Given that this PR is blocked on kagent-dev/tools#34 do you think we can move this into draft for now?

GTRekter requested review from EItanya, ilackarms, peterj and yuval-k as code owners December 4, 2025 14:53

GTRekter and others added 15 commits December 4, 2025 23:54

Add linkerd

6cff5fd

Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Update linkerd

25193c0

Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

chore: Update golangci-lint version and add new linters (kagent-dev#1154

99a6a01

) Signed-off-by: jiangdong <jiangdong@iflytek.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

Fix webhookCertWatcher setting (kagent-dev#1155)

561e5b6

Signed-off-by: jiangdong <jiangdong@iflytek.com> Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

restore default values

a4fb08b

Signed-off-by: Ivan Porta <porta.ivan@outlook.com>

GTRekter force-pushed the linkerd-agent branch from 940d471 to a4fb08b Compare December 4, 2025 14:54

Merge branch 'main' into linkerd-agent

e38d391

Signed-off-by: Ivan (이반) Porta <porta.ivan@outlook.com>

peterj reviewed Dec 5, 2025

View reviewed changes

GTRekter added 3 commits December 8, 2025 17:12

Restore history-max and re-enable grafana and querydoc

33399c8

re-enable querydoc

90ea320

Merge branch 'linkerd-agent' of https://github.com/GTRekter/kagent in…

a1b2834

…to linkerd-agent

peterj reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linkerd agent #1157

Linkerd agent #1157

GTRekter commented Dec 4, 2025 •

edited

Loading

Uh oh!

peterj Dec 5, 2025

Uh oh!

peterj Dec 9, 2025

Uh oh!

EItanya commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Linkerd agent #1157

Are you sure you want to change the base?

Linkerd agent #1157

Conversation

GTRekter commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterj Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

peterj Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

EItanya commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

GTRekter commented Dec 4, 2025 •

edited

Loading