-
-
Notifications
You must be signed in to change notification settings - Fork 5
feat: implement ksail cluster update command with config diff detection and node scaling
#2041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement ksail cluster update command with config diff detection and node scaling
#2041
Conversation
Implements issue #1734 with initial delete + create flow. Changes: - Add NewUpdateCmd with --force flag for confirmation bypass - Implement handleUpdateRunE with cluster existence check - Add confirmation prompt matching deletion pattern - Extract executeClusterCreation for reuse between create/update - Wire update command into cluster parent command - Add unit tests for update command and confirmation prompt This initial version uses a delete + create flow with explicit user confirmation. Future iterations will support in-place updates and node scaling for supported distribution × provider combinations.
✅
|
…s provisioners - Added `Update` method to K3d, Kind, and Talos provisioners to handle configuration changes. - Introduced `UpdateResult` and `Change` types to represent the outcome of update operations. - Implemented diffing logic to determine in-place changes, reboot-required changes, and recreate-required changes. - Enhanced error handling with specific error messages for unsupported operations. - Updated tests to cover new update functionalities and ensure correctness. - Refactored existing code to support new update interfaces and types. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…d-5472fb7c94207991
Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…d-5472fb7c94207991
…d-5472fb7c94207991
…d-5472fb7c94207991 Signed-off-by: Nikolai Emil Damm <ned@devantler.tech>
…d-5472fb7c94207991
Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…e command - Wire componentReconciler into handleUpdateRunE for component-level changes (CNI, CSI, cert-manager, metrics-server, load-balancer, policy engine, GitOps) - Implement full K3d worker scaling via k3d node create/delete commands - Improve Talos stubs with ErrNotImplemented and detailed failure reporting - Add system test steps for ksail cluster update --dry-run and --force - Extract shared helpers (defaultClusterMutationFieldSelectors, registerMirrorRegistryFlag, registerNameFlag, loadAndValidateClusterConfig, runClusterCreationWorkflow) - Add DiffEngine unit tests for all change categories - Use NewEmptyUpdateResult/NewUpdateResultFromDiff factories to reduce duplication
Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…cumentation Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…sue reporting Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…nagement Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
…anagement Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com>
- Check error return values for pipe Close calls - Rename short variable 'r' to 'pipeReader' and 'w' to 'pipeWriter'
…plication Move the shared default ClusterSpec construction into types.DefaultCurrentSpec(). Kind, K3d, and Talos GetCurrentConfig now delegate to this shared factory, removing ~10 lines of identical struct literal from each provisioner.
…ation - Add types.PrepareUpdate() for shared dry-run and recreate-required checks - K3d and Talos Update methods now use the shared helper - Fix stale nolint:ireturn directive formatting in kind provisioner
…ctions
Replace three near-identical SetupK3d{MetricsServer,CSI,LoadBalancer} bodies
with a shared maybeDisableK3dFeature helper. The public wrappers remain for
backward compatibility and test accessibility.
…tion and node scaling (#2041) * feat: add 'ksail cluster update' command for updating clusters Implements issue #1734 with initial delete + create flow. Changes: - Add NewUpdateCmd with --force flag for confirmation bypass - Implement handleUpdateRunE with cluster existence check - Add confirmation prompt matching deletion pattern - Extract executeClusterCreation for reuse between create/update - Wire update command into cluster parent command - Add unit tests for update command and confirmation prompt This initial version uses a delete + create flow with explicit user confirmation. Future iterations will support in-place updates and node scaling for supported distribution × provider combinations. * feat(cluster): implement update functionality for K3d, Kind, and Talos provisioners - Added `Update` method to K3d, Kind, and Talos provisioners to handle configuration changes. - Introduced `UpdateResult` and `Change` types to represent the outcome of update operations. - Implemented diffing logic to determine in-place changes, reboot-required changes, and recreate-required changes. - Enhanced error handling with specific error messages for unsupported operations. - Updated tests to cover new update functionalities and ensure correctness. - Refactored existing code to support new update interfaces and types. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: update generated CLI flags docs * feat(cluster): add error handling for missing TalosConfig during updates Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore(docs): update workflow documentation for clarity and consistency Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: wire component reconciliation and K3d worker scaling into update command - Wire componentReconciler into handleUpdateRunE for component-level changes (CNI, CSI, cert-manager, metrics-server, load-balancer, policy engine, GitOps) - Implement full K3d worker scaling via k3d node create/delete commands - Improve Talos stubs with ErrNotImplemented and detailed failure reporting - Add system test steps for ksail cluster update --dry-run and --force - Extract shared helpers (defaultClusterMutationFieldSelectors, registerMirrorRegistryFlag, registerNameFlag, loadAndValidateClusterConfig, runClusterCreationWorkflow) - Add DiffEngine unit tests for all change categories - Use NewEmptyUpdateResult/NewUpdateResultFromDiff factories to reduce duplication * feat: add E2E test runner agent and prompt files for end-to-end testing Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix(docs): correct code block formatting in FAQ and support matrix documentation Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix(docs): update E2E test runner agent description for clarity on issue reporting Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: enhance error handling and improve update process in cluster management Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: add handoff details to E2E test runner agent documentation Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: enhance cluster configuration handling and deduplication logic Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: improve confirmation prompts and output formatting in cluster management Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: update generated CLI flags docs * fix: resolve lint issues in k3d update.go - Check error return values for pipe Close calls - Rename short variable 'r' to 'pipeReader' and 'w' to 'pipeWriter' * refactor: extract DefaultCurrentSpec to eliminate GetCurrentConfig duplication Move the shared default ClusterSpec construction into types.DefaultCurrentSpec(). Kind, K3d, and Talos GetCurrentConfig now delegate to this shared factory, removing ~10 lines of identical struct literal from each provisioner. * refactor: extract PrepareUpdate helper to reduce Update method duplication - Add types.PrepareUpdate() for shared dry-run and recreate-required checks - K3d and Talos Update methods now use the shared helper - Fix stale nolint:ireturn directive formatting in kind provisioner * refactor: extract maybeDisableK3dFeature to consolidate K3d setup functions Replace three near-identical SetupK3d{MetricsServer,CSI,LoadBalancer} bodies with a shared maybeDisableK3dFeature helper. The public wrappers remain for backward compatibility and test accessibility. * chore: apply golangci-lint fixes * refactor: replace repetitive DiffEngine check methods with table-driven approach * refactor: extract installFromFactory to deduplicate Install*Silent functions * refactor: apply table-driven diff rules to Talos and Hetzner options, remove funlen nolint * refactor: decompose handleUpdateRunE to remove cyclop/funlen nolint * refactor: extract buildK3dExtraArgs to remove funlen/cyclop nolint from CreateK3dConfig * refactor: decompose runPushCommand and resolvePushParams to remove funlen nolint * refactor: extract runParallelValidation to remove funlen nolint from validateDirectory * refactor: extract collectExistingMirrorSpecs to remove funlen nolint from RunStage * style: fix golines formatting in OptionsTalos struct tags * refactor: absorb DiffConfig error into PrepareUpdate to reduce duplication - Add diffErr parameter to types.PrepareUpdate, eliminating per-caller error wrapping boilerplate - Simplify k3d and talos Update methods using the new signature - Resolves jscpd clone between k3d/update.go and talos/update.go * refactor: extract NewDiffResult helper to eliminate DiffConfig duplication - Add types.NewDiffResult factory that initializes UpdateResult and validates specs, replacing inline NewEmptyUpdateResult + nil guard pattern - Update kind and talos DiffConfig methods to use the new helper - Add distribution-specific comments to break contiguous token match - Resolves jscpd clone between kind/update.go and talos/update.go * chore: Apply megalinter fixes * chore: apply golangci-lint fixes * fix: apply PR review comments and resolve all golangci-lint issues - Fix Talos config: apply ControlPlane() config to CP nodes and Worker() config to worker nodes via getNodesByRole/getDockerNodesByRole - Add mutex to listClusterNodes for stdout protection - Sort listAgentNodes output with slices.Sort - Fix nextAgentIndex to compute max index avoiding gap collisions - Extract applyNodeConfig helper for role-aware config application - Extract parseClusterNodes helper to reduce funlen - Add NOTE(limitation) documenting distribution-change detection gap - Track applyInPlaceConfigChanges results in UpdateResult - Update docs: Talos node scaling 'In-place' → 'Planned' - Fix 'restart' → 'recreation' wording in PrepareUpdate - Strip excess whitespace from OptionsHetzner struct tags (golines) - Add ireturn exclusion for k3d/registry.go in .golangci.yml - Remove redundant //nolint:ireturn directives * chore: apply golangci-lint fixes * feat: implement Talos node scaling for Docker and Hetzner providers Add Docker-specific node scaling (scale_docker.go): - Create/remove containers with static IPs and Talos machine config - Best-effort etcd cleanup before control-plane removal - IP allocation, index management, and Talos SDK-compatible container spec Add Hetzner-specific node scaling (scale_hetzner.go): - Create/delete servers via Hetzner API with retry and location fallback - Wait for Talos API and apply config to new servers - Best-effort etcd cleanup before control-plane removal Add shared etcd membership cleanup (etcd.go): - Forfeit leadership then leave cluster before CP node removal - All errors are best-effort (logged, not returned) Update applyNodeScalingChanges with real dispatch logic: - Route to Docker or Hetzner based on provider configuration - Replace stub with working implementation - Add CP >= 1 guard in DiffConfig Add error sentinels, role constants, and Hetzner DeleteServer method. Update docs: support-matrix.mdx and faq.md reflect in-place scaling. * fix(ci): pass --distribution and --provider to cluster update steps The ksail-system-test action runs 'ksail cluster update' without --distribution or --provider flags. When INIT=false (no ksail.yaml), the update command defaults to Vanilla (Kind) and looks for a cluster named 'kind' instead of the actual K3d/Talos cluster. Pass the distribution and provider from the action inputs to both update steps (--dry-run and --force). * fix(setup): guard against nil timer in installCNIOnly The reconcileCNI handler in the update command passes nil as the timer to InstallCNI → installCNIOnly, which calls tmr.NewStage() on the nil interface causing a panic. Add a nil guard before calling tmr.NewStage(). The downstream MaybeTimer helper already handles nil timers correctly. * fix(docs): update formatting in README for consistency Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: Apply megalinter fixes * refactor: apply Copilot review suggestions for update command - Use RoleWorker constant instead of "worker" literal in scaleHetznerWorkers - Remove unused clusterName param from etcdCleanupBeforeRemoval - Fix potential pipe deadlock in k3d listClusterNodes using goroutine - Add user confirmation prompt for reboot-required changes - Return error instead of silent no-op when disabling metrics-server * style: fix golangci-lint issues - Use errors.New instead of fmt.Errorf for static error (err113/perfsprint) - Remove excessive struct tag padding in options.go (golines) * chore: apply golangci-lint fixes * refactor: apply Copilot review fixes for Hetzner scaling and UpdateResult - Parse max node index from server names to avoid naming collisions on scale-up - Sort listHetznerNodesByRole by name for deterministic highest-index-first removal - Initialize all slices in NewEmptyUpdateResult to match its contract * fix: retry Helm chart install on transient network errors - Extend isRetryableNetworkError to cover TCP-level errors: connection reset by peer, connection refused, i/o timeout, TLS handshake timeout, unexpected EOF, no such host - Add retry loop with exponential backoff around InstallOrUpgradeChart - Fix golines padding in OptionsTalos and OptionsHetzner struct tags * chore: apply golangci-lint fixes * fix: apply third-round Copilot review fixes and lint cleanup - Use "%s" format specifier in notify.Warningf calls to prevent accidental format string interpretation (update.go, confirm.go) - Use RoleControlPlane constant instead of string literal in hetznerConfigForRole (scale_hetzner.go) - Remove excessive struct tag padding in options.go (golines) * chore: apply golangci-lint fixes * feat: add maintainer workflow for upgrading gh-aw version Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: pass $ARGS to cluster update steps in system test action The update steps were missing the $ARGS variable (which includes --name), causing 'cluster does not exist' errors when the cluster was created with a custom name via --name flag in $ARGS. * chore: update GitHub workflows for consistency and formatting - Standardized indentation and spacing in YAML files. - Updated comments for clarity and consistency across workflows. - Ensured proper masking of API keys in logs. - Adjusted environment variable handling for better readability. - Refactored handler configurations to use consistent JSON formatting. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: Apply megalinter fixes * chore: apply golangci-lint fixes * fix: update FAQ to clarify cluster recreation process for distribution/provider changes Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: adjust nolint comments for test functions in update_test.go Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: replace client-side state with K8s API-based component detection Replace GetCurrentConfig() static defaults with live cluster detection by querying Helm releases, Kubernetes Deployments, and Docker containers. - Add ReleaseExists method to Helm client Interface - Create pkg/svc/detector with ComponentDetector for live detection - Update GetCurrentConfig(ctx) in Kind, K3d, and Talos provisioners - Wire ComponentDetector through DefaultFactory - Update CLI update command to build and inject detector - Remove client-side state persistence (pkg/io/state deleted previously) * fix: address PR review feedback for isRetryableNetworkError and metricsServer reason - Use regex word-boundary matching for HTTP 5xx status codes in isRetryableNetworkError to prevent false positives on port numbers like ':5000' (addresses review comment on client.go:604) - Update metricsServer diff reason to clarify that disabling requires cluster recreation (addresses review comment on diff.go:86) * feat(workflow): add daily refactor workflow for incremental code improvements Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: enhance session management and state persistence - Introduced sentinel errors for session operations to improve error handling. - Refactored session ID validation to use descriptive error messages. - Improved session loading, saving, and deletion functions for better clarity and consistency. - Added cluster state management for K3s and Kind distributions, allowing for persistent state storage and retrieval. - Implemented tests for cluster state management to ensure reliability and correctness. - Enhanced user interface interactions in session picker and chat components for better user experience. - Updated styles and viewport rendering logic for improved visual consistency. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: update header rendering logic for improved layout and spacing Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: Apply megalinter fixes * feat: refactor chat model and UI components for improved readability and maintainability - Extracted textarea and viewport initialization into separate functions for clarity. - Enhanced message handling in the Update method to streamline event processing. - Introduced helper functions for session management and tool execution handling. - Improved styling constants for better consistency across the UI. - Refactored permission modal rendering and session picker navigation for better code organization. - Updated viewport content handling to accommodate new wrapping logic and styles. - Added detailed comments and improved function signatures for better understanding. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * refactor: update keybinding functions for consistency and clarity Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * refactor: improve code readability by adjusting comment placement in session picker navigation function Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: apply golangci-lint fixes * fix: reorder struct field tags for consistency in options.go Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: apply PR review fixes for helm error handling, state path security, and tag alignment - Only treat driver.ErrReleaseNotFound as 'not found' in ReleaseExists, propagate real errors - Replace os/user.Current() with os.UserHomeDir() for portability - Add path traversal validation for cluster names - Fix struct tag sort order in options.go * chore: apply golangci-lint fixes * fix: clean up nolint comments and improve local registry handling in provisioners Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: apply golangci-lint fixes * fix: apply PR review feedback from review 3770655493 - Fix WalkDir callback in docs_embedded.go to skip errors and continue walking - Remove excessive struct tag padding in options.go (OptionsTalos, OptionsHetzner) - Change update_test.go to use package cluster_test (black-box testing convention) - Guard --dry-run against triggering destructive recreate flow - Let handleRecreateRequired prompt interactively instead of returning error * chore: apply golangci-lint fixes * refactor: extract applyGitOpsLocalRegistryDefault to shared types package Deduplicate identical function from kind, k3d, and talos provisioners into types.ApplyGitOpsLocalRegistryDefault to fix jscpd violations. Also fix remaining golines struct tag padding in options.go. * chore: apply golangci-lint fixes * fix: resolve false-positive Default-vs-Disabled diffs in cluster update Add EffectiveValue methods to CSI, MetricsServer, and LoadBalancer enums that resolve Default to its concrete meaning (Enabled or Disabled) based on distribution × provider. The DiffEngine now uses effective values for comparison, preventing false-positive diffs when Default and Disabled are semantically equivalent (e.g. Vanilla/Docker where nothing is bundled). This fixes the CI failure where the Vanilla/Docker init test detected 3 spurious changes (CSI, MetricsServer, LoadBalancer: Default → Disabled) and then failed on reconcileMetricsServer with 'disabling metrics-server in-place is not yet supported'. Also fix recurring golines struct tag padding in options.go. * chore: apply golangci-lint fixes * fix: resolve Talos/Docker CI failure from resource exhaustion Remove hardcoded 2GB/2CPU resource limits from Talos Docker containers so they use all host resources (matching Kind/K3d behavior). Add post-CNI node readiness check to ensure networking is operational before installing components. Add dedicated install timeout constants for Gatekeeper, ArgoCD, CertManager, and Kyverno. Changes: - Remove defaultNodeMemory, defaultNodeCPUs, nanosPerCPU, bytesPerMiB constants and TALOSSKU env var from Talos Docker provisioner - Add WaitForNodeReady in pkg/k8s to poll until nodes reach Ready state - Call waitForCNIReadiness after CNI install in cni.go - Add GatekeeperInstallTimeout, ArgoCDInstallTimeout, CertManagerInstallTimeout, KyvernoInstallTimeout constants (7min each) - Refactor helmInstallerFactory to accept minTimeout parameter, eliminating code duplication for CertManager/ArgoCD factories * chore: apply golangci-lint fixes * chore: update GitHub workflows and dependencies - Updated the version of gh-aw actions in update-docs.lock.yml and weekly-research.lock.yml from v0.42.13 to v0.42.17. - Changed the Docker image version for gh-aw-mcpg from v0.0.103 to v0.0.113 in both workflows. - Added a new step to handle no-op messages in both workflows. - Updated the timeout constants in helpers.go by removing Gatekeeper and ArgoCD install timeouts. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: address PR review comments and lint issues - Handle WalkDir error parameter in docs_embedded.go to prevent nil dereference - Add missing GatekeeperInstallTimeout and ArgoCDInstallTimeout constants - Remove excessive padding in struct tags for OptionsTalos and OptionsHetzner * chore: Apply megalinter fixes * chore: apply golangci-lint fixes * refactor: streamline scaling functions and improve role handling in Talos provisioner Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat(diff): implement Engine for computing configuration differences - Added Engine struct to compute configuration differences and classify their impact. - Implemented ComputeDiff method to compare old and new ClusterSpec and categorize changes. - Introduced field rules for scalar fields and specific checks for distribution options. - Added support for merging provisioner-specific diff results into the main diff. test(diff): add comprehensive tests for Engine functionality - Created unit tests for Engine to validate diff computation for various scenarios. - Included tests for nil specs, no changes, distribution and provider changes, and component changes. - Added tests for local registry changes, Vanilla and Talos options changes, and Hetzner options changes. feat(merge): add functionality to merge provisioner diffs - Implemented MergeProvisionerDiff to merge provisioner-specific changes into the main diff. - Added utility functions for normalizing field names and collecting existing fields to avoid duplicates. fix(talos): improve Talos client creation logic - Enhanced createTalosClient to prefer saved Talos config for TLS credentials. - Updated logic to handle naming conventions for control-plane nodes. test(registry): refine tests for backend factory overrides - Adjusted tests for backend factory to ensure proper setup and cleanup of shared state. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: add Cobra command runner and timer package - Implemented a new Cobra command runner for executing commands with output capture. - Added a timer package for tracking command execution duration. - Created tests for the command runner to ensure stdout and stderr are captured correctly. - Updated documentation for the runner and timer packages. - Removed deprecated utils package and refactored code to use new runner and timer implementations. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * refactor: reorganize imports across multiple files for consistency Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat(k8s): integrate k8s package for unique label value retrieval in provider and provisioner Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat: introduce cluster error handling and refactor provisioner structure - Added a new package `clustererr` to define common error types for cluster provisioners, improving error handling consistency across implementations (Kind, K3d, Talos). - Refactored existing code to utilize the new error types, enhancing clarity and maintainability. - Updated import paths for config managers to follow a consistent naming convention. - Implemented a `MultiProvisioner` to manage multiple distribution provisioners, allowing operations to be routed based on existing clusters. - Improved documentation and comments throughout the codebase for better understanding and usability. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat(metrics-server): add metrics-server installer and related documentation - Implemented MetricsServerInstaller for installing and managing metrics-server on Kubernetes clusters. - Added Helm chart installation and upgrade functionality. - Created unit tests for the MetricsServerInstaller. - Introduced a new package `clusterupdate` for shared types and functions related to cluster updates. - Refactored existing provisioner code to utilize the new `clusterupdate` package for handling update operations. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat(toolgen): implement tool generation and MCP integration - Added tool generation functionality in `generator.go` to traverse Cobra command tree and generate SDK-agnostic tool definitions. - Implemented exclusion logic for commands and their children based on annotations and prefixes. - Created `ToMCPTools` function in `mcp.go` to register tool definitions with the MCP server. - Developed tests for tool generation and MCP integration in `generator_test.go` and `mcp_test.go`. - Introduced options for tool generation in `options.go`, including command exclusion and logging. - Built JSON schema for command flags in `schema.go`, supporting various flag types and default values. Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * feat(localregistry): add WaitForK3dLocalRegistryReady function to ensure local registry readiness Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: Apply megalinter fixes * chore: update generated CLI flags docs * chore: apply golangci-lint fixes * refactor: use role constants and fix struct tag formatting - Replace string literal role values with RoleWorker/RoleControlPlane constants in provisioner_hetzner.go and update.go to avoid drift/typos - Fix golines formatting issue in options.go struct tags * chore: apply golangci-lint fixes * refactor: fix typos and use help separator constant * chore: apply golangci-lint fixes * refactor: use checkmarkSuffix constant for UI consistency * chore: apply golangci-lint fixes * refactor: use go.mod tool directive for golangci-lint version and fix PR review comments - Add golangci-lint as a tool dependency in go.mod instead of .golangci-lint-version - Update copilot-setup-steps.yml to read version from go.mod - Update daily-perf-improver build steps to extract version from go.sum - Fix modalPadding → scrollIndicatorLines in chat dimensions (PR review) * chore: tidy go modules * refactor: improve package cohesion and eliminate stuttering type names * chore: apply golangci-lint fixes * refactor: improve package cohesion and structure - Rename pkg/io/ to pkg/fsutil/ (better describes file system utilities) - Move registry helpers from cli/helpers/registry to pkg/svc/registryresolver - Move Docker factory functions to pkg/client/docker/resources.go - Merge kubeconfig low-level functions into pkg/k8s/rest_config.go - Inline trivial iostreams package into callers - Mark deprecated delegation wrappers in helpers/docker and helpers/kubeconfig * chore: remove unused version-file parameter from golangci-lint setup Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: apply golangci-lint fixes * refactor: extract cluster detection to pkg/svc/detector/cluster - Move cluster detection logic (DetectInfo, DetectDistributionFromContext, ResolveKubeconfigPath) from pkg/cli/lifecycle to pkg/svc/detector/cluster - Rename ClusterInfo to Info and DetectClusterInfo to DetectInfo to fix revive stuttering lint violation - Remove dead lifecycle detection wrapper (detection.go) and associated snapshot — all callers now import pkg/svc/detector/cluster directly - Remove duplicate lifecycle detection tests (already covered by detector package tests) - Remove unused error re-exports (ErrClusterNotFoundInDistributions, ErrCreateNotSupported, ErrUnknownDistribution) from lifecycle package - Remove dead code: unused kubeconfig functions and GetCurrentKubeContext - Fix gci import ordering in delete.go, apiserver.go, apiserver_test.go * refactor: apply PR review fixes for width clamping, timer drain, and dead code * refactor: fix kubeconfig doc comment and use go list for lint version resolution * refactor: flatten cli/helpers hierarchy into cli/flags, cli/editor, cli/kubeconfig, cli/dockerutil * refactor: split testing.go into deps.go and add missing doc.go files * refactor: eliminate installer duplication with helmutil.Base and shared helpers - Extract helmutil.Base type for standard repo-based Helm installers - Refactor certmanager, gatekeeper, kubeletcsrapprover, kyverno to embed helmutil.Base - Extract shared ImagesFromChart helper to helmutil package - Delegate cni.InstallerBase.ImagesFromChart to helmutil.ImagesFromChart - Remove stuttering from all installer type names (e.g. FluxInstaller → Installer) - Split cluster/testing.go into deps.go (production DI) + testing.go (test overrides) - Add doc.go to 12 packages missing documentation * refactor: remove stuttering from generator types and embed helmutil.Base in remaining installers - Rename generator types to remove package-name prefix stuttering: k3d.K3dGenerator → k3d.Generator kind.KindGenerator → kind.Generator kustomization.KustomizationGenerator → kustomization.Generator yaml.YAMLGenerator → yaml.Generator talos.TalosGenerator → talos.Generator talos.TalosConfig → talos.Config - Embed helmutil.Base in metricsserver and hetznercsi installers - Remove dead kubeconfig/context fields from metricsserver installer - Simplify helmClientSetup helper in components.go * refactor: remove stuttering from provisioner types - Rename cluster provisioner interfaces: cluster.ClusterProvisioner → cluster.Provisioner cluster.ClusterUpdater → cluster.Updater - Rename provisioner struct types: kind.KindClusterProvisioner → kind.Provisioner k3d.K3dClusterProvisioner → k3d.Provisioner talos.TalosProvisioner → talos.Provisioner - Rename kind provider types: kind.KindProvider → kind.Provider kind.DefaultKindProviderAdapter → kind.DefaultProviderAdapter - Update all mock types and external callers * refactor: remove stuttering from unexported type names - chat.chatFlags → chat.flags - chat.chatMessage → chat.message - cluster.clusterOperation → cluster.operation - cluster.clusterResult → cluster.result - flux.fluxInstanceManager → flux.instanceManager - flux.fluxSetupParams → flux.setupParams - helm.helmActionConfig → helm.actionConfig - k3d.k3dMirrorConfig → k3d.registryMirrors / k3d.mirrorConfig - k3d.k3dNodeInfo → k3d.nodeInfo * refactor: split registryresolver/registry.go into focused files * refactor: split registry lifecycle.go into network.go and naming.go * refactor: extract Docker container helpers from delete.go into delete_docker.go * fix: apply PR review fixes for mockery paths and enter symbol constant * Potential fix for code scanning alert no. 1014: Incorrect conversion between integer types Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Nikolai Emil Damm <ned@devantler.tech> * fix: remove unused nolint directive after ParseInt migration * refactor: extract pkg/k8s/readiness from pkg/k8s for cohesion Split the grab-bag pkg/k8s package into two cohesive packages: - pkg/k8s: client configuration (REST config, kubeconfig, DNS, labels) - pkg/k8s/readiness: resource readiness polling (deployments, daemonsets, nodes, API server, multi-resource coordination) Rename ReadinessCheck to Check to avoid stutter (readiness.Check vs readiness.ReadinessCheck). * refactor: move TrimNonEmpty to registry package and update doc.go files - Move TrimNonEmpty from pkg/fsutil to pkg/svc/provisioner/registry where it is actually used, improving fsutil cohesion - Update outdated doc.go files for pkg/cli, pkg/svc, and pkg/client to reflect current subpackage structure * refactor: unexport internal functions and remove dead argocd code - Unexport SetupK3dCSI, SetupK3dLoadBalancer, AllDistributions, AllProviders (only used within their own package) - Remove unused Installer, Manager, StatusProvider interfaces and Status struct from argocd package - Delete argocd mocks.go (all 3 mocked interfaces were unused) - Remove argocd entry from .mockery.yml - Rename interfaces.go to options.go (now only contains option structs) * refactor: split chat.go into focused files by responsibility - chat.go (265 lines): command definition, flags, client setup, auth - nontui.go (217 lines): non-TUI signal handling, interactive loop - streaming.go (135 lines): streaming state, event handling, send functions - tools.go (174 lines): tool permission, mode wrapping, argument formatting - tui.go (104 lines): TUI chat setup, output forwarder, model filtering * refactor: remove dead code and unexport internal functions Dead code removed: - SetEnsureFluxResourcesForTests (never called) - UserError on fileAlreadyEncryptedError (never called, type is unexported) - WaitForResourceReadiness (entire file, never called) - DefaultUpdateOptions (never called) - ClassifyTalosPatch and helper functions (never called) - ExtractImagesFromBytes (never called) Functions unexported (only used within their package): - HandleRunE → handleRunE (lifecycle) - HandleConnectRunE → handleConnectRunE (cluster) - ResolveClusterNameFromContext → resolveClusterNameFromContext (setup) - NeedsCSIInstall → needsCSIInstall (setup) - InstallKubeletCSRApproverSilent → installKubeletCSRApproverSilent (setup) - ExecuteTool → executeTool (toolgen) * refactor: rename generic result type to listResult and fix gci formatting * refactor: split viewportPadding into width/height constants and fix dependabot paths * Update tools.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Nikolai Emil Damm <ned@devantler.tech> * chore: apply golangci-lint fixes * refactor: eliminate code duplication in chat tools argument formatting (#2127) * Initial plan * refactor: eliminate code duplication in formatArgsMap helper Extract shared argument formatting logic into formatArgsMap to eliminate jscpd duplication between getToolArgs and formatToolArguments. - Created formatArgsMap helper for map[string]any to key=value string conversion - Refactored getToolArgs to use formatArgsMap and add parentheses wrapper - Refactored formatToolArguments to use formatArgsMap for consistent behavior - Maintains sorted keys for deterministic output - Reduces code duplication from 8 lines to 0 per jscpd analysis Verified: ✅ jscpd reports 0 duplications (was 0.01% over threshold) ✅ golangci-lint reports 0 issues ✅ All chat package tests pass ✅ Code compiles successfully Co-authored-by: devantler <26203420+devantler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: devantler <26203420+devantler@users.noreply.github.com> * refactor: rename Instance to FluxInstance across the codebase Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: update Go version to 1.26.0 across project files Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * chore: Apply megalinter fixes * fix: update Go version to 1.26.0 and refactor pointer usage in multiple files Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: update mockery configuration and handle session abort in streaming Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: update Kyverno install timeout to 10 minutes and enhance Helm install context management Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> * fix: abort in-flight Copilot request on context cancellation and standardize error message casing Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> --------- Signed-off-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> Signed-off-by: Nikolai Emil Damm <ned@devantler.tech> Co-authored-by: botantler[bot] <botantler[bot]@users.noreply.github.com> Co-authored-by: Nikolai Emil Damm <nikolaiemildamm@icloud.com> Co-authored-by: devantler <26203420+devantler@users.noreply.github.com> Co-authored-by: Nikolai Emil Damm <ned@devantler.tech> Co-authored-by: botantler[bot] <185060876+botantler[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
…d-5472fb7c94207991
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 181 out of 553 changed files in this pull request and generated 1 comment.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Nikolai Emil Damm <ned@devantler.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 181 out of 553 changed files in this pull request and generated no new comments.
- Add 'ksail cluster update' to features.mdx command table - Add 'ksail cluster update' to copilot-instructions.md CLI reference - Ensures documentation accurately reflects the new update command from PR #2041 The cluster update command was recently merged but was missing from key documentation files where other cluster lifecycle commands are listed.
- Add 'ksail cluster update' to features.mdx command table - Add 'ksail cluster update' to copilot-instructions.md CLI reference - Ensures documentation accurately reflects the new update command from PR #2041 The cluster update command was recently merged but was missing from key documentation files where other cluster lifecycle commands are listed. Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nikolai Emil Damm <ned@devantler.tech>
- Add ksail cluster update command to main usage workflow - Explain when to use it (after modifying ksail.yaml) - Ensures consistency with .github/copilot-instructions.md - References feature introduced in v5.30.0 (#2041)
- Add ksail cluster update command to main usage workflow - Explain when to use it (after modifying ksail.yaml) - Ensures consistency with .github/copilot-instructions.md - References feature introduced in v5.30.0 (#2041) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nikolai Emil Damm <ned@devantler.tech>
- Add ksail cluster update to main usage workflow in README.md - Add ksail cluster update to Quick Start section in docs/index.mdx - Ensures consistency with features.mdx and copilot-instructions.md - References feature introduced in v5.30.0 (#2041) The cluster update command was documented in features.mdx and copilot-instructions.md but was missing from the README and docs homepage, causing documentation inconsistency.
- Add ksail cluster update to main usage workflow in README.md - Add ksail cluster update to Quick Start section in docs/index.mdx - Ensures consistency with features.mdx and copilot-instructions.md - References feature introduced in v5.30.0 (#2041) The cluster update command was documented in features.mdx and copilot-instructions.md but was missing from the README and docs homepage, causing documentation inconsistency. Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nikolai Emil Damm <ned@devantler.tech>

Implements the
ksail cluster updatecommand for in-place cluster configuration updates, along with extensive codebase refactoring and several bug fixes.Fixes #1734
Fixes #2072
What changed and why
ksail cluster updatecommandAdded a new
ksail cluster updatecommand that detects configuration changes and applies them in-place when supported, or prompts for delete + recreate when necessary.pkg/svc/diff/) compares running cluster state againstksail.yamlto determine what changed--force)Bug fixes
InstancetoFluxInstanceto match the upstream Flux Operator CRD kind, fixing a 5-minute timeout in K3s Flux system testsWaitForK3dLocalRegistryReadyto ensure registry is ready before useNew features (beyond update command)
pkg/toolgen/)pkg/runner/andpkg/timer/utilitiesRefactoring (across 40+ commits)
cli/helpersintocli/flags,cli/editor,cli/kubeconfig,cli/dockerutil; extractedpkg/k8s/readiness,pkg/svc/detector/cluster; movedpkg/ai/toolgentopkg/toolgen; movedpkg/utils/timertopkg/timerhelmutil.Baseand shared helpers to eliminate duplicated Helm install logicType of change