feat: Add waypoint mode with automatic gateway provisioning#259
feat: Add waypoint mode with automatic gateway provisioning#259akram wants to merge 4 commits intokagenti:mainfrom
Conversation
fc185d1 to
54ec33f
Compare
…ioning Implement automatic Istio waypoint gateway provisioning for namespaces containing Kagenti agent or tool workloads, with fixes for controller startup and centralized configuration support. NamespaceWaypointReconciler Features: - Watches namespaces and pods with kagenti.io/type=agent|tool labels - Automatically applies Istio ambient mesh labels to namespaces - Creates waypoint gateways using istio-waypoint GatewayClass - Configures HBONE protocol listeners on port 15008 - Controlled by --enable-waypoint-provisioning flag (defaults to true) - Triggers namespace reconciliation on pod create/update/delete events Istio Ambient Mesh Configuration: - Namespace labels: istio-discovery=enabled, istio.io/dataplane-mode=ambient - Waypoint reference: istio.io/use-waypoint=<namespace>-waypoint - Gateway labels: istio.io/waypoint-for=all Cache Configuration Fixes: - Removed DefaultNamespaces configuration (was nil, causing issues) - Removed explicit ByObject entries for Namespace, Pod, Deployment, StatefulSet, Gateway - Kept only ConfigMap in ByObject with label selectors for kagenti-relevant ConfigMaps - All other resources now use default cluster-wide cache (controller-runtime defaults) - Added detailed comments explaining cache configuration rationale The root issue was that explicitly adding cluster-scoped resources (Namespace) or workload resources to ByObject prevented controllers from starting properly. By removing these entries and relying on controller-runtime defaults, all controllers now start and reconcile correctly. Client Registration Enhancements: - Added support for centralized kagenti-operator-config in operator namespace - First checks kagenti-system/kagenti-operator-config (preferred for waypoint mode) - Falls back to per-namespace authbridge-config (backward compatibility for sidecar mode) - Added OperatorNamespace field to ClientRegistrationReconciler - Improved error messages to indicate which ConfigMap source is being used Debug Logging: - Added comprehensive debug logging to NamespaceWaypointReconciler - Logs controller setup (success/failure) at startup - Logs every reconcile invocation with namespace and enabled status - Helps troubleshoot controller startup and reconciliation issues Dependencies: - Added sigs.k8s.io/gateway-api v1.2.1 for Gateway resource support Testing Validated: - Automatic waypoint gateway provisioning works end-to-end - Created test-ns-alpha and test-ns-beta namespaces with agent pods - Waypoint gateways auto-created within 19 seconds - Istio labels automatically applied to namespaces - Operator-managed client registration in Keycloak working - OAuth 2.0 token exchange between agents validated - All controllers starting and reconciling properly - Single-container agent pods confirmed (waypoint mode active) RBAC Requirements: - Added ClusterRole permissions for gateways.gateway.networking.k8s.io - Added permissions for namespaces (get, list, watch, update, patch) - Required for waypoint gateway creation and namespace label management This implements Phase 3 (Operator Modifications) of the waypoint implementation plan, enabling zero-touch namespace configuration for Istio ambient mesh with centralized L7 authentication via waypoint gateways. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Akram <akram.benaissi@gmail.com>
54ec33f to
2e2b17d
Compare
…espace Changed the ClientRegistrationReconciler to read the keycloak-admin-secret from the operator namespace (kagenti-system) instead of agent namespaces. This improves security by centralizing access to Keycloak admin credentials. Security Benefits: - Admin credentials only exist in operator namespace (kagenti-system) - Agent namespaces never have access to Keycloak admin username/password - Reduces attack surface - compromised agent namespace cannot access admin API - Follows principle of least privilege - Aligns with centralized configuration pattern (kagenti-operator-config) Changes to ClientRegistrationReconciler: - Read keycloak-admin-secret from r.OperatorNamespace instead of workload namespace - Updated error messages to indicate operator namespace location - Added comments explaining security model and secret location - Updated APIReader comment to reflect new secret location Documentation Updates (operator-managed-client-registration.md): - Clarified that keycloak-admin-secret lives in operator namespace only - Updated requirements section to specify operator namespace for admin secret - Updated reconcile flow to show admin secret read from operator namespace - Updated RBAC section to reflect split configuration placement - Updated migration guide to specify operator namespace setup - Added security note that agent namespaces should NOT have this secret Installation Impact: - Installation scripts must create keycloak-admin-secret in kagenti-system - Agent namespaces do not need this secret (simplified namespace setup) - Existing deployments: delete keycloak-admin-secret from agent namespaces - Operator will automatically start using the centralized secret Testing: - Verified with test-ns-alpha and test-ns-beta namespaces - Confirmed client registration works with centralized secret - Confirmed operator logs show correct namespace lookup This completes the centralized configuration pattern started with kagenti-operator-config, providing a consistent security model for waypoint mode deployments. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Akram <akram.benaissi@gmail.com>
Add comprehensive documentation for waypoint mode feature: - docs/waypoint-mode.md: Complete design and user guide - Architecture and design principles - Key components (NamespaceWaypointReconciler, ClientRegistrationReconciler) - Step-by-step deployment instructions - Code examples (YAML, Python, Bash) - Security model and performance characteristics - Troubleshooting guide and FAQ - docs/migration-sidecar-to-waypoint.md: Migration guide - Blue-green migration strategy - Phase-by-phase migration procedure - Rollback procedures - Validation checklist and automated script - Troubleshooting migration issues - Best practices Total: 51KB of production-ready documentation covering design, deployment, migration, security, and operations. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Akram <akram.benaissi@gmail.com>
Created visual diagrams to complement waypoint-mode.md: - waypoint-architecture.mmd: High-level system architecture - agent-communication-flow.mmd: Token exchange sequence diagram - operator-reconciliation.mmd: Controller reconciliation flows - security-architecture.mmd: Centralized secret management model - waypoint-vs-sidecar.mmd: Resource comparison between modes Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Akram <akram.benaissi@gmail.com>
2e2b17d to
5f88ed7
Compare
Alan-Cha
left a comment
There was a problem hiding this comment.
Summary
This PR implements Istio waypoint mode as the default deployment pattern for Kagenti, moving from per-agent sidecar envoys to centralized waypoint gateways. The implementation is solid, but there is a critical documentation inconsistency that must be fixed before merge.
Areas reviewed: Go code, Kubernetes controllers, Helm charts, documentation, security model, commit conventions
Commits: 4 commits, all include DCO sign-off ✅
CI status: All checks passing ✅
Critical Issue: Feature Flag Default Inconsistency
The feature flag --enable-waypoint-provisioning has conflicting defaults between code and documentation:
- ✅ Actual code (
cmd/main.go:44):flag.BoolVar(&enableWaypointProvisioning, "enable-waypoint-provisioning", true, ...) - ✅ PR description: "Enable automatic waypoint gateway creation (default: true)"
- ❌ Documentation (
docs/waypoint-mode.mdlines ~1515, ~1817): Shows examples withdefault: false
Impact: Operators reading the documentation will have incorrect expectations about the default behavior. Since waypoint mode is described as the "new default deployment pattern", defaulting to true is correct, but the documentation must match.
Required fix: Update the documentation examples to show default: true consistently.
Positive Findings
Security Hardening ✅
The keycloak-admin-secret read was moved from agent namespaces to operator namespace only (internal/controller/clientregistration_controller.go). This prevents credential exposure and follows the principle of least privilege. Excellent security improvement.
Optional enhancement: Consider adding a validation check that warns if keycloak-admin-secret exists in agent namespaces (legacy deployments), suggesting migration to the centralized model.
Controller Implementation ✅
NamespaceWaypointReconciler has clean, idempotent reconciliation logic with:
- Proper status conditions
- Comprehensive error handling
- Waypoint label propagation to NetworkAttachmentDefinition resources (shows good understanding of Istio ambient mesh requirements)
Documentation ✅
The security architecture diagram (docs/diagrams/security-architecture.mmd) clearly illustrates the isolation boundary between operator namespace (admin credentials) and agent namespaces (per-client credentials).
Summary of Changes
- New controller:
NamespaceWaypointReconcilerautomatically provisions waypoint gateways for agent namespaces - Security: Centralized
keycloak-admin-secretaccess (operator namespace only) - Feature flags: Both
--enable-waypoint-provisioningand--enable-operator-client-registrationdefault totrue - Migration path: Blue-green namespace migration strategy documented
Verdict: REQUEST_CHANGES - Fix the documentation inconsistency, then this is ready to merge.
|
@akram wondering if we should wait to merge this PR until we make some decision on the approach (e.g. Istio Ambient dependency) |
Summary
This PR implements waypoint mode using Istio ambient mesh as the new default deployment pattern for Kagenti agents, replacing the legacy sidecar mode.
Key changes:
keycloak-admin-secretis now read exclusively from the operator namespace (kagenti-system), never from agent namespacesArchitecture:
Feature gates:
--enable-waypoint-provisioning: Enable automatic waypoint gateway creation (default: true)--enable-operator-client-registration: Enable centralized Keycloak client registration (default: true)Migration path:
Documentation
docs/waypoint-mode.md: Architecture, design principles, deployment guidedocs/migration-sidecar-to-waypoint.md: Step-by-step migration proceduredocs/diagrams/: Mermaid.js diagrams (architecture, flows, security model)Test plan
🤖 Generated with Claude Code