Skip to content

Agent Substrate subchart: ate-api-server can't reach Valkey (ATE_API_REDIS_ADDRESS missing release-name prefix) #2092

Description

@jmunozro

Summary

When Agent Substrate is installed via the new subchart path (substrate.enabled=true on kagent / kagent-crds, added in #2030 and released in OSS 0.9.10), the ate-api-server pod crashloops because it cannot resolve its Redis/Valkey backend:

ate-api-server "Failed to connect to Redis/Valkey, retrying..." attempt=21
  err="dial tcp: lookup valkey-cluster.kagent.svc on 10.1.0.10:53: no such host"
stream closed EOF for kagent/kagent-ate-api-server-deployment-... (assemble-cred-bundle)

The standalone install (helm install substrate …) is unaffected — this only breaks when Substrate runs as a subchart, which is exactly the new, recommended install path.

Affected: kagent OSS 0.9.10, substrate subchart v0.0.6 (oci://ghcr.io/kagent-dev/substrate/helm/substrate:0.0.6).

Root cause

The Valkey Service name and the Redis address the api-server reads are derived two different ways:

  • Valkey Servicesubstrate/templates/valkey.yaml:
    name: {{ include "substrate.fullname" (list "valkey-cluster" .) }}
  • ATE_API_REDIS_ADDRESSsubstrate/templates/ate-api-server-envvars.yaml:
    ATE_API_REDIS_ADDRESS: {{ .Values.redis.clusterAddress | default (printf "valkey-cluster.%s.svc:6379" .Release.Namespace) | quote }}

substrate.fullname (substrate/templates/_helpers.tpl) returns the bare name only when .Release.Name == .Chart.Name, otherwise it prefixes with the release name:

{{- define "substrate.fullname" -}}
{{- $name := index . 0 -}}
{{- $ctx := index . 1 -}}
{{- if eq $ctx.Release.Name $ctx.Chart.Name -}}
{{- $name -}}
{{- else -}}
{{- printf "%s-%s" $ctx.Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}

Therefore:

Install mode Valkey Service ATE_API_REDIS_ADDRESS default Result
Standalone (release == chart name substrate) valkey-cluster valkey-cluster.<ns>.svc:6379 ✅ match
Subchart (release kagent) kagent-valkey-cluster valkey-cluster.<ns>.svc:6379 ❌ mismatch

The Redis-address default is the only cross-component reference that hardcodes the unprefixed name. Every other one already uses substrate.fullname correctly — atenet-router → api (--ateapi-address), atelet → rustfs, rustfs self-ref, and jwt-bootstrap $apiHost — so they all work as a subchart.

Reproduce (no cluster needed)

helm template kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent --version 0.9.10 \
  --set substrate.enabled=true --set providers.openAI.apiKey=x -n kagent \
  | grep -E 'ATE_API_REDIS_ADDRESS|name: kagent-valkey-cluster$'
  ATE_API_REDIS_ADDRESS: "valkey-cluster.kagent.svc:6379"   # <- points at a Service that doesn't exist
    name: kagent-valkey-cluster                             # <- actual Service name

Workaround

Override the address explicitly under the substrate subchart values:

substrate:
  enabled: true
  redis:
    clusterAddress: kagent-valkey-cluster.<release-namespace>.svc:6379

For an already-running cluster, patch the (unprefixed) ConfigMap and restart the deployment, since ate-api-server reads it via envFrom and won't auto-roll:

kubectl -n kagent patch configmap ate-api-server-envvars --type merge \
  -p '{"data":{"ATE_API_REDIS_ADDRESS":"kagent-valkey-cluster.kagent.svc:6379"}}'
kubectl -n kagent rollout restart deploy/kagent-ate-api-server-deployment

Suggested fix

Template the ATE_API_REDIS_ADDRESS default with the same helper that names the Service, so it stays correct in both install modes:

ATE_API_REDIS_ADDRESS: {{ .Values.redis.clusterAddress | default (printf "%s.%s.svc:6379" (include "substrate.fullname" (list "valkey-cluster" .)) .Release.Namespace) | quote }}

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

Status
Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions