Summary
When Agent Substrate is installed via the new subchart path (substrate.enabled=true on kagent / kagent-crds, added in #2030 and released in OSS 0.9.10), the ate-api-server pod crashloops because it cannot resolve its Redis/Valkey backend:
ate-api-server "Failed to connect to Redis/Valkey, retrying..." attempt=21
err="dial tcp: lookup valkey-cluster.kagent.svc on 10.1.0.10:53: no such host"
stream closed EOF for kagent/kagent-ate-api-server-deployment-... (assemble-cred-bundle)
The standalone install (helm install substrate …) is unaffected — this only breaks when Substrate runs as a subchart, which is exactly the new, recommended install path.
Affected: kagent OSS 0.9.10, substrate subchart v0.0.6 (oci://ghcr.io/kagent-dev/substrate/helm/substrate:0.0.6).
Root cause
The Valkey Service name and the Redis address the api-server reads are derived two different ways:
- Valkey
Service — substrate/templates/valkey.yaml:
name: {{ include "substrate.fullname" (list "valkey-cluster" .) }}
ATE_API_REDIS_ADDRESS — substrate/templates/ate-api-server-envvars.yaml:
ATE_API_REDIS_ADDRESS: {{ .Values.redis.clusterAddress | default (printf "valkey-cluster.%s.svc:6379" .Release.Namespace) | quote }}
substrate.fullname (substrate/templates/_helpers.tpl) returns the bare name only when .Release.Name == .Chart.Name, otherwise it prefixes with the release name:
{{- define "substrate.fullname" -}}
{{- $name := index . 0 -}}
{{- $ctx := index . 1 -}}
{{- if eq $ctx.Release.Name $ctx.Chart.Name -}}
{{- $name -}}
{{- else -}}
{{- printf "%s-%s" $ctx.Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
Therefore:
| Install mode |
Valkey Service |
ATE_API_REDIS_ADDRESS default |
Result |
Standalone (release == chart name substrate) |
valkey-cluster |
valkey-cluster.<ns>.svc:6379 |
✅ match |
Subchart (release kagent) |
kagent-valkey-cluster |
valkey-cluster.<ns>.svc:6379 |
❌ mismatch |
The Redis-address default is the only cross-component reference that hardcodes the unprefixed name. Every other one already uses substrate.fullname correctly — atenet-router → api (--ateapi-address), atelet → rustfs, rustfs self-ref, and jwt-bootstrap $apiHost — so they all work as a subchart.
Reproduce (no cluster needed)
helm template kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent --version 0.9.10 \
--set substrate.enabled=true --set providers.openAI.apiKey=x -n kagent \
| grep -E 'ATE_API_REDIS_ADDRESS|name: kagent-valkey-cluster$'
ATE_API_REDIS_ADDRESS: "valkey-cluster.kagent.svc:6379" # <- points at a Service that doesn't exist
name: kagent-valkey-cluster # <- actual Service name
Workaround
Override the address explicitly under the substrate subchart values:
substrate:
enabled: true
redis:
clusterAddress: kagent-valkey-cluster.<release-namespace>.svc:6379
For an already-running cluster, patch the (unprefixed) ConfigMap and restart the deployment, since ate-api-server reads it via envFrom and won't auto-roll:
kubectl -n kagent patch configmap ate-api-server-envvars --type merge \
-p '{"data":{"ATE_API_REDIS_ADDRESS":"kagent-valkey-cluster.kagent.svc:6379"}}'
kubectl -n kagent rollout restart deploy/kagent-ate-api-server-deployment
Suggested fix
Template the ATE_API_REDIS_ADDRESS default with the same helper that names the Service, so it stays correct in both install modes:
ATE_API_REDIS_ADDRESS: {{ .Values.redis.clusterAddress | default (printf "%s.%s.svc:6379" (include "substrate.fullname" (list "valkey-cluster" .)) .Release.Namespace) | quote }}
References
Summary
When Agent Substrate is installed via the new subchart path (
substrate.enabled=trueonkagent/kagent-crds, added in #2030 and released in OSS 0.9.10), theate-api-serverpod crashloops because it cannot resolve its Redis/Valkey backend:The standalone install (
helm install substrate …) is unaffected — this only breaks when Substrate runs as a subchart, which is exactly the new, recommended install path.Affected: kagent OSS 0.9.10, substrate subchart
v0.0.6(oci://ghcr.io/kagent-dev/substrate/helm/substrate:0.0.6).Root cause
The Valkey
Servicename and the Redis address the api-server reads are derived two different ways:Service—substrate/templates/valkey.yaml:ATE_API_REDIS_ADDRESS—substrate/templates/ate-api-server-envvars.yaml:substrate.fullname(substrate/templates/_helpers.tpl) returns the bare name only when.Release.Name == .Chart.Name, otherwise it prefixes with the release name:Therefore:
ServiceATE_API_REDIS_ADDRESSdefaultsubstrate)valkey-clustervalkey-cluster.<ns>.svc:6379kagent)kagent-valkey-clustervalkey-cluster.<ns>.svc:6379The Redis-address default is the only cross-component reference that hardcodes the unprefixed name. Every other one already uses
substrate.fullnamecorrectly —atenet-router → api(--ateapi-address),atelet → rustfs,rustfsself-ref, andjwt-bootstrap$apiHost— so they all work as a subchart.Reproduce (no cluster needed)
Workaround
Override the address explicitly under the
substratesubchart values:For an already-running cluster, patch the (unprefixed) ConfigMap and restart the deployment, since
ate-api-serverreads it viaenvFromand won't auto-roll:kubectl -n kagent patch configmap ate-api-server-envvars --type merge \ -p '{"data":{"ATE_API_REDIS_ADDRESS":"kagent-valkey-cluster.kagent.svc:6379"}}' kubectl -n kagent rollout restart deploy/kagent-ate-api-server-deploymentSuggested fix
Template the
ATE_API_REDIS_ADDRESSdefault with the same helper that names the Service, so it stays correct in both install modes:References