Skip to content

Conversation

@arturshadnik
Copy link
Member

@arturshadnik arturshadnik commented Nov 6, 2025

  • create default topology resources during manager startup (hub only) instead of via helm chart
  • add a wait until ManagedCluster is deleted when cleaning up Spoke. There is a rare but possible race condition here, where if the ManagedCluster takes a few extra seconds to remove, hub cleanup may not fully finish, leaving an orphaned ClusterManager and controllers

Summary by CodeRabbit

  • New Features

    • Topology resources now initialized programmatically at controller startup with new --enable-topology-resources flag (enabled by default).
  • Bug Fixes

    • Improved managed cluster cleanup by adding deletion synchronization to ensure resources are fully removed before proceeding.
  • Documentation

    • Updated references for hub-as-spoke deployment contexts.
  • Chores

    • Container image version updated to v0.1.5.

… is gone

Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

This PR adds seven new topology resource name constants to the public API and implements automatic creation of topology resources (namespaces, managed cluster sets, bindings, and placements) at controller startup via a new EnableTopologyResources flag. The manager reconciler setup is refactored to support the new initialization flow, and spoke cleanup logic is enhanced with deletion synchronization.

Changes

Cohort / File(s) Summary
API Constants
fleetconfig-controller/api/v1beta1/constants.go
Added seven exported topology resource name constants: NamespaceManagedClusterSetGlobal, NamespaceManagedClusterSetDefault, NamespaceManagedClusterSetSpokes, ManagedClusterSetGlobal, ManagedClusterSetDefault, ManagedClusterSetSpokes, PlacementSpokes
CLI Configuration and Options
fleetconfig-controller/cmd/main.go, fleetconfig-controller/cmd/manager/manager.go
Added cluster API v1beta1/v1beta2 imports and scheme registration in main.go; added EnableTopologyResources bool field to Options struct
Manager Initialization
fleetconfig-controller/cmd/manager/manager.go
Refactored Hub reconciler setup with explicit instantiation; introduced topology resource creation hook gated by EnableTopologyResources flag; reorganized legacy FleetConfig controller and webhook initialization path
Topology Resource Setup
fleetconfig-controller/cmd/manager/setup.go
Added private createTopologyResources function implementing idempotent creation of namespaces, ManagedClusterSets, ManagedClusterSetBindings, and Placement resources with error handling and logging
Helm Chart Documentation and Values
fleetconfig-controller/charts/fleetconfig-controller/README.md, fleetconfig-controller/charts/fleetconfig-controller/values.yaml
Updated topologyResources description to reference Hub creation with DefaultClusterSet feature gate; bumped image tag from v0.1.4 to v0.1.5
Helm Deployment Template
fleetconfig-controller/charts/fleetconfig-controller/templates/deployment.yaml
Added --enable-topology-resources flag to manager container args, wired to .Values.topologyResources.enabled with default true
Topology Resources Template
fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/topology-resources.yaml
Removed entire conditional template block for rendering topology resources (namespaces, ManagedClusterSets, ManagedClusterSetBindings, Placement resources)
Development Script
fleetconfig-controller/devspace-start-hub.sh
Added --enable-topology-resources=true flag to RUN_CMD and DEBUG_CMD for controller and debugger execution
Spoke Controller Cleanup
fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
Added waitForManagedClusterDeleted method to synchronize deletion with polling; integrated wait logic into ManagedCluster removal flow

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Manager setup refactoring in cmd/manager/manager.go: Multiple reconciler initialization paths and topology resource startup logic require careful review to ensure error propagation and initialization ordering
  • Topology resource creation logic in cmd/manager/setup.go: New function creates multiple OCM resources with idempotent mutations; verify error handling and resource dependencies
  • Reconciler refactoring implications: Changes to Hub reconciler instantiation and legacy controller path may affect existing functionality; verify no regression in FleetConfig webhook setup and Spoke/Hub controller initialization

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • ahmad-ibra
  • TylerGillson

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the two main changes: moving topology resource creation from Helm templates to code startup, and adding deletion wait logic for Spoke cleanup.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fleetconfig-controller/cmd/manager/setup.go (1)

24-141: Consider adding managed-by labels or annotations.

The created resources don't include labels or annotations indicating they're managed by the fleetconfig-controller. While the well-known names make ownership clear, adding labels like app.kubernetes.io/managed-by: fleetconfig-controller would:

  • Make it easier to query for controller-managed resources
  • Follow Kubernetes common labels conventions
  • Provide clear ownership signals to users and other tooling

Example for namespaces:

 globalNs := &corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: v1beta1.NamespaceManagedClusterSetGlobal}}
 result, err := controllerutil.CreateOrUpdate(ctx, cli, globalNs, func() error {
+	if globalNs.Labels == nil {
+		globalNs.Labels = make(map[string]string)
+	}
+	globalNs.Labels["app.kubernetes.io/managed-by"] = "fleetconfig-controller"
 	return nil // namespace has no spec to mutate
 })
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c4948d and b587da8.

📒 Files selected for processing (10)
  • fleetconfig-controller/api/v1beta1/constants.go (1 hunks)
  • fleetconfig-controller/charts/fleetconfig-controller/README.md (2 hunks)
  • fleetconfig-controller/charts/fleetconfig-controller/templates/deployment.yaml (1 hunks)
  • fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/topology-resources.yaml (0 hunks)
  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml (2 hunks)
  • fleetconfig-controller/cmd/main.go (3 hunks)
  • fleetconfig-controller/cmd/manager/manager.go (5 hunks)
  • fleetconfig-controller/cmd/manager/setup.go (1 hunks)
  • fleetconfig-controller/devspace-start-hub.sh (1 hunks)
  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go (3 hunks)
💤 Files with no reviewable changes (1)
  • fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/topology-resources.yaml
🧰 Additional context used
🧠 Learnings (14)
📓 Common learnings
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.
📚 Learning: 2025-08-22T17:55:52.159Z
Learnt from: TylerGillson
Repo: open-cluster-management-io/lab PR: 51
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-22T17:55:52.159Z
Learning: In the open-cluster-management-io/lab repository, chart versioning for fleetconfig-controller is handled automatically via GitHub release workflows, not through manual version bumps in Chart.yaml during regular PRs.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/templates/deployment.yaml
  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
📚 Learning: 2025-09-22T18:42:03.404Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/templates/deployment.yaml
  • fleetconfig-controller/devspace-start-hub.sh
  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/api/v1beta1/constants.go
  • fleetconfig-controller/cmd/manager/setup.go
  • fleetconfig-controller/cmd/main.go
  • fleetconfig-controller/cmd/manager/manager.go
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-08-27T21:58:32.141Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 58
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-27T21:58:32.141Z
Learning: In the open-cluster-management-io/lab repository, the fleetconfig-controller follows a workflow where chart version bumps (in README.md and values.yaml) are included in PRs before the corresponding Docker image exists. The Docker image is built and pushed automatically via GitHub release workflows after the PR is merged and tagged, making the referenced version available.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
📚 Learning: 2025-08-22T19:38:49.769Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 52
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go:49-56
Timestamp: 2025-08-22T19:38:49.769Z
Learning: In the fleetconfig-controller project, the SpokeSpec and HubSpec structs in v1beta1 contain only optional fields (like Foo *string with omitempty tags), so creating these resources without populating the Spec field does not cause validation failures in tests.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/cmd/main.go
  • fleetconfig-controller/cmd/manager/manager.go
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-22T19:26:11.020Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/test/data/fleetconfig-v1alpha1.yaml:47-53
Timestamp: 2025-09-22T19:26:11.020Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller tests, the kubeconfigKey is intentionally set to "value" in test fixtures (fleetconfig-v1alpha1.yaml, fleetconfig-values.yaml) because that's how the test harness provisions the kubeconfig secret during test setup. This differs from the chart default of "kubeconfig" but is correct for the test environment.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
📚 Learning: 2025-09-25T23:31:11.630Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/fcc-addon/addon-template.yaml:110-112
Timestamp: 2025-09-25T23:31:11.630Z
Learning: The fleetconfig-controller-manager spoke agent requires create/update/patch/delete permissions on CustomResourceDefinitions because `clusteradm upgrade klusterlet` operations need create/update permissions and cleanup operations require delete permissions for proper lifecycle management.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/cmd/manager/setup.go
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-10-01T20:56:57.301Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/cmd/manager/manager.go:274-277
Timestamp: 2025-10-01T20:56:57.301Z
Learning: In fleetconfig-controller/cmd/manager/manager.go, the hub kubeconfig read by getHubRestConfig() is auto-generated and mounted with a consistent format, not user-supplied. The blanket string replacement for tls.crt and tls.key paths is suitable for this controlled environment.

Applied to files:

  • fleetconfig-controller/charts/fleetconfig-controller/values.yaml
  • fleetconfig-controller/cmd/manager/manager.go
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
📚 Learning: 2025-09-22T19:16:34.109Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/webhook/v1beta1/validation.go:103-121
Timestamp: 2025-09-22T19:16:34.109Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller v1beta1 API, the Klusterlet field in SpokeSpec is defined as a struct value (Klusterlet Klusterlet), not a pointer (*Klusterlet), so direct field access like Klusterlet.Annotations is safe without nil checks. The Klusterlet struct does not contain a Source field.

Applied to files:

  • fleetconfig-controller/cmd/main.go
  • fleetconfig-controller/charts/fleetconfig-controller/README.md
  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-25T23:18:41.573Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:125-133
Timestamp: 2025-09-25T23:18:41.573Z
Learning: In the fleetconfig-controller spoke deletion flow, SpokeCleanupFinalizer is always removed before HubCleanupFinalizer. This means that checking for the existence of HubCleanupFinalizer in the deletion logic is sufficient regardless of cluster type, as any SpokeCleanupFinalizer would have already been removed by the time the hub cleanup runs.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream prepareKlusterletValuesFile function properly handles nil values by returning early, making additional nil checks unnecessary and potentially harmful to the intended flow.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream prepareKlusterletValuesFile function properly handles nil values by returning early with (nil, nil, nil), making additional nil checks unnecessary and potentially harmful to the intended flow.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream functions (prepareKlusterletValuesFile and hash.ComputeHash) properly handle nil values - prepareKlusterletValuesFile returns early with (nil, nil, nil) and hash.ComputeHash uses ZeroNil: true option to handle nil as zero values.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream functions handle nil properly: prepareKlusterletValuesFile returns early with (nil, nil, nil) for nil values, and hash.ComputeHash uses ZeroNil: true option to treat nil as zero values when computing hashes.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
🧬 Code graph analysis (4)
fleetconfig-controller/cmd/manager/setup.go (1)
fleetconfig-controller/api/v1beta1/constants.go (10)
  • NamespaceManagedClusterSetGlobal (229-229)
  • ManagedClusterSetGlobal (238-238)
  • NamespaceManagedClusterSetDefault (232-232)
  • ManagedClusterSetDefault (241-241)
  • NamespaceManagedClusterSetSpokes (235-235)
  • ManagedClusterSetSpokes (244-244)
  • LabelManagedClusterType (159-159)
  • ManagedClusterTypeHub (71-71)
  • ManagedClusterTypeHubAsSpoke (77-77)
  • PlacementSpokes (247-247)
fleetconfig-controller/cmd/main.go (1)
fleetconfig-controller/api/v1beta1/groupversion_info.go (1)
  • AddToScheme (44-44)
fleetconfig-controller/cmd/manager/manager.go (4)
fleetconfig-controller/internal/controller/v1beta1/hub_controller.go (1)
  • HubReconciler (55-59)
fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go (1)
  • SpokeReconciler (52-58)
fleetconfig-controller/internal/controller/v1alpha1/fleetconfig_controller.go (1)
  • FleetConfigReconciler (58-62)
fleetconfig-controller/api/v1alpha1/fleetconfig_webhook.go (1)
  • SetupFleetConfigWebhookWithManager (38-43)
fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go (1)
fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go (1)
  • SpokeReconciler (52-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: test (fleetconfig-controller) / Run Helm Chart Tests
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
🔇 Additional comments (5)
fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go (2)

536-545: LGTM! Wait mechanism addresses race condition.

The addition of waitForManagedClusterDeleted after the ManagedCluster deletion properly blocks until the resource is fully removed, preventing the race condition where a ManagedCluster taking extra time to delete could leave orphaned ClusterManager resources. This directly addresses the PR objectives.


700-731: LGTM! Well-implemented polling with appropriate timeouts.

The wait function correctly:

  • Uses a reasonable 30s timeout and 2s polling interval
  • Treats NotFound as successful deletion
  • Continues polling on transient errors rather than failing immediately
  • Provides clear error messages with cluster name for debugging
fleetconfig-controller/api/v1beta1/constants.go (1)

225-248: LGTM! Clear and consistent topology resource constants.

The new exported constants provide well-named, consistent identifiers for topology resources. The naming conventions align with OCM standards, and the documentation is clear.

Note: PlacementSpokes and ManagedClusterSetSpokes both have the value "spokes", but this is acceptable since they're used for different resource types (Placement vs ManagedClusterSet).

fleetconfig-controller/cmd/manager/setup.go (2)

19-140: LGTM! Solid implementation of topology resource creation.

The function follows a clean, consistent pattern for creating topology resources with:

  • Proper use of CreateOrUpdate for idempotent startup behavior
  • Appropriate error handling with descriptive context
  • Key business logic at lines 85-105 correctly excludes Hub and HubAsSpoke cluster types from the spokes ManagedClusterSet

The empty PlacementSpec at line 131 will select all clusters from bound ManagedClusterSets, which is appropriate for the default spokes placement.


25-27: Document CreateOrUpdate overwrite behavior.

CreateOrUpdate will overwrite the spec of existing resources if they differ from the controller's defaults. While appropriate for controller-managed default resources, users who have customized these topology resources (e.g., modified the spokes ManagedClusterSet selector) will have their changes reverted on controller restart.

Consider documenting this behavior in the controller's user documentation or README to set clear expectations about resource ownership.

Do you want me to help draft documentation explaining which resources are managed by the controller and will be reset to defaults?

Also applies to: 40-42, 51-53, 66-68, 77-79, 87-100, 114-116, 130-132

@openshift-ci
Copy link

openshift-ci bot commented Nov 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arturshadnik, TylerGillson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@arturshadnik arturshadnik changed the title feat: create default resources in code; block spoke delete until gone ✨ create default resources in code; block spoke delete until gone Nov 6, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 4036a73 into open-cluster-management-io:main Nov 6, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants