OCPCLOUD-3327: Rewrite manifests-gen to support upgrade safety by mdbooth · Pull Request #434 · openshift/cluster-capi-operator

mdbooth · 2025-12-16T17:33:00Z

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features
- Load and apply provider manifests from container images with profile support.
- RBAC Role/RoleBinding for pull-secret access and a configurable provider-images volume.
Improvements
- More robust manifest generation and resource processing across API groups (including webhooks and CRDs).
- Controller reconciles provider images, extracts/applies provider manifests, and reports degraded state on errors.
- ClusterOperator status includes related objects; CRD-driven protection policy generation.
Refactor
- Migration to typed Kubernetes objects, new manifest-generation CLI flow, and dependency reorganization.
Tests
- Added tests covering provider image extraction and controller helpers.

openshift-ci-robot · 2025-12-16T17:33:03Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2025-12-16T17:33:07Z

📝 Walkthrough

Walkthrough

Refactors manifest generation to an object-centric model ([]client.Object), adds a Go-based kustomize compile/write pipeline, and introduces provider-image discovery/extraction, profile metadata loading, sorting and application of provider manifests during reconciliation; integrates provider-images into operator setup and manifest outputs. (32 words)

Changes

Cohort / File(s)	Summary
Manifest generation & helpers `manifests-gen/customizations.go`, `manifests-gen/util.go`, `manifests-gen/generate.go`, `manifests-gen/kustomization.yaml`	Switches to `[]client.Object` processing, adds conversion/type helpers (`mustConvert`, `convert`, `stripStatus`, `setTypeMetadataFromScheme`), group-based filtering, CRD-driven InfraCluster protection policy generation, and a Go kustomize compile + write pipeline. Adds kustomize component and Secret local-config patch.
CLI & flags `manifests-gen/main.go`	Centralizes CLI into `cmdlineOptions` and `attributeFlags`, adds `validateFlags()` with aggregated errors and platform validation.
Removed legacy provider pipeline `manifests-gen/providercustomizations.go` (deleted), `manifests-gen/providers.go` (deleted)	Removes prior provider-specific customization dispatcher and the previous provider orchestration (clusterctl/kustomize pipeline).
Provider images package `pkg/providerimages/providerimages.go`, `pkg/providerimages/pullsecret.go`, `pkg/providerimages/providerimages_test.go`	New package to parse pull-secrets into a Docker keychain, fetch images (imageFetcher/remoteImageFetcher), extract `/capi-operator-manifests` across layered image filesystem, discover profiles, load metadata, write manifests with content hashes, and return `ProviderImageManifests`. Includes extensive unit tests and a fake fetcher.
Installer controller changes `pkg/controllers/capiinstaller/capi_installer_controller.go`, `pkg/controllers/capiinstaller/capi_installer_controller_test.go`	Adds `ProviderImages` field, `reconcileProviderImages()`, `applyProviderImage()`, `sortProvidersByInstallOrder()`, refactors manifest extraction to reader-based flow, improves type-safety for CRDs/Deployments, enhances degraded-condition handling, and updates tests.
Operator binary integration `cmd/capi-operator/main.go`	Loads provider images via `ReadProviderImages()`, supplies `ProviderImages` to installer controller, introduces `PROVIDER_IMAGE_DIR` env var and default, and refactors controller setup/error propagation.
Config parsing changes `pkg/util/readconfig.go`	Removes provider-file reader and YAML parsing; `ReadImagesFile()` now parses JSON into `map[string]string`.
Manifests updates `manifests/0000_30_cluster-api_03_rbac_roles.yaml`, `manifests/0000_30_cluster-api_04_rbac_bindings.yaml`, `manifests/0000_30_cluster-api_11_deployment.yaml`, `manifests/0000_30_cluster-api_12_clusteroperator.yaml`, `manifests/0000_30_cluster-api-installer_04_images.configmap.yaml`, `manifests/image-references`	Adds RBAC Role/RoleBinding for pull-secret access, mounts provider-images volume and env var in the deployment, updates ClusterOperator `status.relatedObjects`, and adds image reference entries / images.json mapping.
Operator status & naming `pkg/operatorstatus/operator_status.go`	Updates RelatedObjects references to new serviceaccount/deployment names and pluralized validating admission policy resource names.
Dependencies & modules `go.mod`, `manifests-gen/go.mod`, `e2e/go.mod`, `hack/tools/go.mod`	Adds/updates dependencies for container image handling and kustomize/controller-runtime ecosystems (e.g., google/go-containerregistry, docker/cli, OpenShift APIs), plus indirects and a local replace.
Small lint edits `pkg/controllers/infracluster/azure.go`, `pkg/conversion/capi2mapi/aws.go`	Removes `nolint:gosec` directives; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as Reconciler
    participant Controller as CapiInstallerController
    participant Reader as providerimages.ReadProviderImages
    participant Fetcher as remoteImageFetcher
    participant Registry as ContainerRegistry
    participant FS as Filesystem

    Reconciler->>Controller: reconcile(ctx)
    Controller->>Reader: ReadProviderImages(ctx, k8sClient, containerImages, providerImageDir)
    Reader->>Fetcher: Fetch(imageRef) using keychain
    Fetcher->>Registry: HTTP request for image blob
    Registry-->>Fetcher: return v1.Image
    Fetcher->>FS: extract /capi-operator-manifests from layers
    FS->>FS: write files, discover profiles, parse metadata.yaml
    FS-->>Reader: []ProviderImageManifests
    Reader-->>Controller: provider list
    Controller->>Controller: sortProvidersByInstallOrder(...)
    loop per provider
      Controller->>FS: read manifests.yaml (reader)
      Controller->>Controller: extractManifests(reader) -> []docs
      Controller->>Controller: applyProviderComponents(ctx, docs)
    end
    Controller-->>Reconciler: reconciliation result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble layers, tar and light,

I fetch the manifests through the night.
I sniff the profiles, sort by order true,
I stitch and apply each manifest for you.
Hooray — provider images hop into view!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'OCPCLOUD-3327: Rewrite manifests-gen to support upgrade safety' clearly and specifically summarizes the main objective of the pull request: rewriting the manifests-gen module to support upgrade safety through image-embedded provider manifests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-12-16T17:33:07Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-12-16T17:33:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign racheljpg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-12-16T17:35:31Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Remove provider customisation for PowerVS

manifests-gen: Rewrite to support Update Safety

capiinstaller: Compatibility with new manifests-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

mdbooth · 2025-12-16T17:35:57Z

/test e2e-aws-ovn-techpreview

mdbooth · 2025-12-16T17:42:17Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584

mdbooth · 2025-12-17T07:47:52Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584

mdbooth · 2025-12-17T07:48:14Z

/test e2e-aws-ovn-techpreview

mdbooth · 2025-12-19T16:16:26Z

/test e2e-aws-ovn-techpreview
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584

mdbooth · 2025-12-19T18:06:33Z

/test e2e-aws-ovn-techpreview

mdbooth · 2025-12-19T18:07:45Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

mdbooth · 2025-12-19T19:09:53Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

mdbooth · 2025-12-19T21:50:56Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

mdbooth · 2025-12-21T12:59:28Z

The e2e-aws-ovn-techpreview pass shows the modifications to the capiinstaller controller are backwards compatible with current transport configmaps.

openshift/cluster-api-provider-aws#584 and openshift/cluster-api#259 update CAPA and core CAPI to publish manifests only in the provider images. They do not publish transport configmaps.

The multi-pr test pass with these 2 PRs shows that pulling manifests from images is working.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

manifests-gen/customizations.go (1)
164-164: ⚠️ Potential issue | 🟡 Minor

Minor: Typo in panic message.

The panic message has an errant "CustomResourceDefinition" suffix that appears to be a copy-paste artifact.
🔧 Proposed fix
-				panic("can't find secret from cert:CustomResourceDefinition " + certNN)
+				panic("can't find secret from cert: " + certNN)

🤖 Fix all issues with AI agents

In `@cmd/capi-operator/main.go`:
- Around line 147-151: The call to providerimages.ReadProviderImages is using
context.Background() which ignores the setupControllers signal-aware context
(ctx); change the call in setupControllers to pass the received ctx instead
(i.e., replace context.Background() with ctx) so
providerimages.ReadProviderImages(...) is cancellable on shutdown and uses the
same signal-handling lifecycle as the rest of the controller initialization.

In `@manifests/0000_30_cluster-api_11_deployment.yaml`:
- Line 98: The deployment references the webhook TLS secret as
cluster-capi-operator-webhook-service-cert but the webhook Service annotation
still uses the old name capi-controllers-webhook-service-cert; update the
Service's admission webhook annotation (the metadata.annotations key that sets
the webhook TLS secret) to use cluster-capi-operator-webhook-service-cert so the
secret name is consistent with the Deployment's secretName value.

🧹 Nitpick comments (1)

pkg/providerimages/providerimages.go (1)
140-163: Consider propagating context cancellation errors.

The errgroup pattern here collects all errors via channel but ignores any context cancellation. If the context is cancelled (e.g., shutdown), g.Wait() will return the context error, but it's discarded. This is likely acceptable for startup-time processing, but worth noting.
_ = g.Wait() // We're not actually returning errors directly

cmd/capi-operator/main.go

coderabbitai · 2026-02-03T16:40:54Z

manifests/0000_30_cluster-api_11_deployment.yaml

        secret:
          defaultMode: 420
-          secretName: capi-controllers-webhook-service-cert
+          secretName: cluster-capi-operator-webhook-service-cert


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for references to both old and new webhook secret names echo "=== Searching for old secret name (should not exist) ===" rg -n "capi-controllers-webhook-service-cert" --type yaml echo "" echo "=== Searching for new secret name ===" rg -n "cluster-capi-operator-webhook-service-cert" --type yaml

Repository: openshift/cluster-capi-operator

Length of output: 437

Fix webhook certificate secret name mismatch across manifests.

The webhook certificate secret rename is incomplete. The deployment at line 98 references the new name cluster-capi-operator-webhook-service-cert, but the webhook service annotation in manifests/0000_30_cluster-api_02_webhook-service.yaml:9 still uses the old name capi-controllers-webhook-service-cert. This mismatch will cause the webhook to fail at runtime. Update the service annotation to use the new secret name consistently.

🤖 Prompt for AI Agents

In `@manifests/0000_30_cluster-api_11_deployment.yaml` at line 98, The deployment references the webhook TLS secret as cluster-capi-operator-webhook-service-cert but the webhook Service annotation still uses the old name capi-controllers-webhook-service-cert; update the Service's admission webhook annotation (the metadata.annotations key that sets the webhook TLS secret) to use cluster-capi-operator-webhook-service-cert so the secret name is consistent with the Deployment's secretName value.

openshift-ci-robot · 2026-02-03T19:28:32Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load provider images from registries, extract and cache per-profile manifests, and surface them to the installer.

Refactor

Reworked manifest generation to an object-centric, typed pipeline and centralized CLI validation.

Chores

Bumped container/Docker/OpenShift-related dependencies and updated module manifests.

Added RBAC Role/RoleBinding and mounted provider-images volume with env var.

Tests

Added comprehensive tests for provider image reading and manifest extraction.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

theobarberbany · 2026-02-04T11:01:14Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

theobarberbany · 2026-02-04T17:57:18Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

openshift-ci-robot · 2026-02-04T17:58:17Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load provider images from registries, extract/cache per-profile manifests, and surface them to the installer; CLI supports provider-image inputs.

Refactor

Reworked manifest generation to an object-centric, typed pipeline with centralized CLI validation and platform/profile handling.

Chores

Bumped dependency versions and refreshed module manifests; added kustomize component, RBAC Role/RoleBinding, and mounted provider-images volume with env var.

Tests

Added comprehensive tests for provider image reading and manifest extraction.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

mdbooth

/lgtm

Thanks 👍

openshift-ci · 2026-02-04T17:59:47Z

@mdbooth: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Thanks 👍

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot · 2026-02-05T14:02:59Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load provider images from registries, extract/cache per-profile manifests, and surface them to the installer; CLI accepts provider-image inputs.

Refactor

Reworked manifest generation to an object-centric, typed pipeline with centralized CLI validation and platform/profile handling.

Chores

Updated module/dependency versions, added kustomize component, RBAC Role/RoleBinding, provider-images volume mount and env var.

Tests

Added comprehensive tests for provider image reading and manifest extraction.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@pkg/providerimages/providerimages.go`:
- Around line 140-194: readProviderImages currently discards all successful
manifests when any image fetch fails (it returns nil, err), which contradicts
the concurrent/fault-tolerant design; change the return behavior so that
readProviderImages returns partial results along with the aggregated error
instead of nil when some images succeeded (i.e., return providerImages, err),
and ensure any callers of readProviderImages handle a non-nil slice plus error;
locate the aggregation logic around providerImageResult, the loop collecting
into providerImages, and the final return in readProviderImages, and update the
final return path (and tests/doc comments) to reflect returning partial results
on partial failures.

🧹 Nitpick comments (1)

pkg/providerimages/providerimages.go (1)
95-118: Documentation still mentions "map of provider names" but parameter is []string.

The comment at line 98 says "containerImages is a map of provider names to provider image references" but the actual parameter type is []string. This appears to be a documentation inconsistency from an earlier revision.
📝 Suggested doc fix
 // ReadProviderImages returns a list of ProviderImageManifests read directly
 // from operand container images.
 //
-// containerImages is a map of provider names to provider image references
+// containerImages is a list of provider image references
 //

pkg/providerimages/providerimages.go

theobarberbany · 2026-02-05T14:07:05Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

Adds a flag, profile-name allowing specification of a profile (and therefore subdirectory) to output manifests and metadata to.

When parsing images, the logic now checks for multiple 'profiles' (sub directories) under /capi-operator-manifests. If multiple are found, we return them all.

| Old Field | New Location | |-----------|--------------| | `ProviderName` | `Name` | | `ProviderImageRef` | `SelfImageRef` | | `ProviderType` | `Attributes["type"]` | | `ProviderVersion` | `Attributes["version"]` | | `OCPPlatform` | `OCPPlatform` (unchanged) | | (new) | `InstallOrder` | Now installs based on `InstallOrder` rather than sorting on type/version.

- Adds static related objects so we can debug when we miss-configure something - Fixes names in related objects in operatorstatus to reflect new names in manifests

Debug Logging - Added detailed logging when reconciling provider images (manifestsPath, type, version, profile, platform) - Added logging for manifest extraction count - Added component count summary before applying (CRDs, deployments, static components) - Added per-resource logging when applying CRDs and deployments Error Handling Improvements - Fixed applyProviderImage to use named return so reader.Close() errors are not silently lost - Fixed type assertion error messages to show actual type received (expected X but got %T) - Added errUnexpectedResourceType static error for linter compliance - Updated setDegradedCondition to accept and surface the actual error in ClusterOperator status conditions - Changed degraded condition logging from Info to Error level Provider Images Library - Added error logging when provider image processing fails - Improved error wrapping with context for: - Manifest extraction failures (includes output directory) - Profile discovery failures (includes directory path) - Manifest write failures (includes profile name)

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

manifests/0000_30_cluster-api_11_deployment.yaml (1)
29-83: ⚠️ Potential issue | 🟠 Major

Add container securityContext to enforce non-root execution and match SCC requirements.

The pod annotation declares openshift.io/required-scc: restricted-v2 (line 23), which mandates runAsNonRoot: true, allowPrivilegeEscalation: false, capability dropping, and seccomp. However, the containers lack explicit securityContext fields, creating a security posture gap. Adding explicit container-level hardening ensures fail-fast validation at deployment time rather than relying solely on runtime SCC enforcement.

The Dockerfile does not include a USER directive, meaning the image defaults to root; explicit securityContext is essential to enforce the non-root intent declared by the SCC annotation.
🔒 Proposed hardening (apply to both containers)
       - name: capi-controllers
         image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          capabilities:
+            drop: ["ALL"]
+          seccompProfile:
+            type: RuntimeDefault
@@
       - name: machine-api-migration
         image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          capabilities:
+            drop: ["ALL"]
+          seccompProfile:
+            type: RuntimeDefault
Verify that both binaries (capi-controllers, machine-api-migration) and the provider-images emptyDir mount function correctly when run as non-root before enforcing runAsNonRoot: true.
pkg/conversion/capi2mapi/aws.go (2)
548-553: ⚠️ Potential issue | 🟠 Major

Guard the PlacementGroupPartition int64→int32 cast against overflow.

The function performs an unchecked cast from int64 to int32, which triggers the G115 linter warning (unsafe type conversion with potential data loss). Add bounds checking to prevent silent overflow for values outside the int32 range.
Suggested fix
 func convertAWSPlacementGroupPartition(in int64) *int32 {
   if in == 0 {
     return nil
   }
 
+  if in < math.MinInt32 || in > math.MaxInt32 {
+    return nil // consider returning an error to the caller instead
+  }
   return ptr.To(int32(in))
 }
525-531: ⚠️ Potential issue | 🟠 Major

Add a lower-bound check before casting Throughput to int32.

The current validation only checks the upper bound (math.MaxInt32). To satisfy gosec's G115 rule for unchecked int64→int32 conversions and defend against negative values, add a lower-bound guard.
🔧 Suggested fix
  if volume.Throughput != nil {
-   if *volume.Throughput > math.MaxInt32 {
-     return mapiv1beta1.BlockDeviceMappingSpec{}, field.Invalid(fldPath.Child("throughput"), *volume.Throughput, "throughput exceeds maximum int32 value")
-   }
+   if *volume.Throughput < 0 || *volume.Throughput > math.MaxInt32 {
+     return mapiv1beta1.BlockDeviceMappingSpec{}, field.Invalid(
+       fldPath.Child("throughput"),
+       *volume.Throughput,
+       "throughput must be non-negative and fit in int32",
+     )
+   }
 
    bdm.EBS.ThroughputMib = ptr.To(int32(*volume.Throughput))
  }

🧹 Nitpick comments (3)

manifests-gen/generate.go (1)
103-103: Unnecessary O_APPEND flag with O_TRUNC.

When using O_TRUNC, the file is truncated to zero length, so O_APPEND has no practical effect since writes will always be at the end of an empty/growing file. This applies to both writeManifests (line 103) and writeMetadata (line 138).
🧹 Suggested simplification
-	manifestsFile, err := os.OpenFile(manifestsPathname, os.O_CREATE|os.O_TRUNC|os.O_WRONLY|os.O_APPEND, 0600)
+	manifestsFile, err := os.OpenFile(manifestsPathname, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0600)
cmd/capi-operator/main.go (1)
154-166: Make provider image refs deterministic and deduplicated.

maps.Values is randomized and can include duplicates, which may lead to non-deterministic processing or redundant pulls. Sorting + compacting keeps behavior stable and avoids extra work.
♻️ Proposed fix
-	containerImageRefs := slices.Collect(maps.Values(containerImages))
+	containerImageRefs := slices.Collect(maps.Values(containerImages))
+	slices.Sort(containerImageRefs)
+	containerImageRefs = slices.Compact(containerImageRefs)
manifests/0000_30_cluster-api_11_deployment.yaml (1)
95-100: Consider adding an emptyDir.sizeLimit for the cache.

This prevents unbounded growth of provider image caches on the node.
♻️ Example (adjust size as appropriate)
       - name: provider-images
-        emptyDir: {}
+        emptyDir:
+          sizeLimit: 1Gi

openshift-ci-robot · 2026-02-05T16:48:36Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load provider manifests directly from container images with extraction, caching and profile support.

RBAC Role and RoleBinding for pull-secret access to enable provider image authentication.

Controller support for a provider-images volume and configurable provider image directory.

Improvements

Enhanced manifest generation and resource processing with broader API-group handling.

ClusterOperator status now lists related objects for improved observability.

Controller reconciles and applies provider image manifests.

Internal Changes

Migrated processing to typed Kubernetes objects and removed legacy provider-specific customization logic.

Dependency and module updates.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

e2e/go.mod (1)

80-121: ⚠️ Potential issue | 🔴 Critical

Retract github.com/klauspost/compress to a stable version; v1.18.1 is marked retracted due to invalid encoding.

The klauspost/compress v1.18.1 indirect dependency is a retracted version with critical flate/zip/gzip encoding bugs. Replace it with a stable, non-retracted version (e.g., v1.17.x or newer stable release).

golang.org/x/oauth2 v0.33.0 and golang.org/x/tools v0.39.0 are compatible with Go 1.24 and have no breaking changes affecting this codebase.

go.mod (1)

12-33: ⚠️ Potential issue | 🔴 Critical

Do not use klauspost/compress v1.18.1—it was retracted due to invalid flate/zip/gzip encoding.

Use v1.18.2 or later instead. This version introduced bugs affecting compression correctness, making it unsuitable for production use.

Additionally, flag go-containerregistry v0.20.7 for daemon client usage: there is a known compatibility issue with Docker v29 stemming from the github.com/docker/docker module deprecation (migrating to github.com/moby/moby/...). If the provider-image workflow relies on daemon client functionality, verify that this does not impact your integration.

The remaining dependencies (docker/cli v29.2.1, golang.org/x/sync v0.18.0, golang.org/x/tools v0.39.0) have no breaking changes relative to their prior versions and are Go 1.24 compatible.

openshift-ci-robot · 2026-02-05T16:58:02Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load and apply provider manifests directly from container images with profile support.

RBAC Role/RoleBinding for pull-secret access and configurable provider-images volume.

Improvements

More robust manifest generation and resource processing across API groups.

Controller now reconciles provider images and applies provider manifests.

ClusterOperator status includes related objects for better observability.

CRD-driven policy generation and additional resource handling.

Internal Changes

Migration to typed Kubernetes objects and dependency updates.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

We need to pull in the cluster-api image as that's where the core CAPI CRDs live

openshift-ci-robot · 2026-02-05T17:15:25Z

@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue.

Details

In response to this:

Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.

Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.

Summary by CodeRabbit

New Features

Load and apply provider manifests from container images with profile support.

RBAC Role/RoleBinding for pull-secret access and a configurable provider-images volume.

Improvements

More robust manifest generation and resource processing across API groups (including webhooks and CRDs).

Controller reconciles provider images, extracts/applies provider manifests, and reports degraded state on errors.

ClusterOperator status includes related objects; CRD-driven protection policy generation.

Refactor

Migration to typed Kubernetes objects, new manifest-generation CLI flow, and dependency reorganization.

Tests

Added tests covering provider image extraction and controller helpers.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

theobarberbany · 2026-02-05T17:20:35Z

/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259

openshift-ci · 2026-02-05T17:26:25Z

@mdbooth: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal3-capi-techpreview	`3da26fb`	link	false	`/test e2e-metal3-capi-techpreview`
ci/prow/e2e-azure-ovn-techpreview	`1b4e131`	link	false	`/test e2e-azure-ovn-techpreview`
ci/prow/okd-scos-images	`932ab21`	link	true	`/test okd-scos-images`
ci/prow/images	`932ab21`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025

mdbooth changed the title ~~Rewrite manifests-gen to support upgrade safety~~ OCPCLOUD-3327: Rewrite manifests-gen to support upgrade safety Dec 16, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 16, 2025

mdbooth force-pushed the manifests-gen branch from 1ab7f33 to 5087e3b Compare December 16, 2025 18:19

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 18, 2025

mdbooth force-pushed the manifests-gen branch from dab2c52 to 7c0b761 Compare December 18, 2025 16:34

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 18, 2025

mdbooth force-pushed the manifests-gen branch 3 times, most recently from f3ba06c to a8a93e3 Compare December 18, 2025 16:54

mdbooth force-pushed the manifests-gen branch from 7f9fc54 to 493d1f3 Compare January 5, 2026 13:07

mdbooth marked this pull request as ready for review January 5, 2026 13:09

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026

openshift-ci bot requested review from nrb and theobarberbany January 5, 2026 13:09

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

theobarberbany force-pushed the manifests-gen branch from f49c453 to 8f1aad3 Compare February 3, 2026 19:25

mdbooth commented Feb 4, 2026

View reviewed changes

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

pkg/providerimages/providerimages.go Show resolved Hide resolved

mdbooth and others added 8 commits February 5, 2026 15:08

installer: Support image-based manifests

4e8e6ea

manifests-gen outputs to profile directory

f60087e

Adds a flag, profile-name allowing specification of a profile (and therefore subdirectory) to output manifests and metadata to.

capi installer now knows about manifest profiles

c5d8b7a

When parsing images, the logic now checks for multiple 'profiles' (sub directories) under /capi-operator-manifests. If multiple are found, we return them all.

removes old capi installer endpoint after rebase

b418e59

Fix up related objects

0079aa7

- Adds static related objects so we can debug when we miss-configure something - Fixes names in related objects in operatorstatus to reflect new names in manifests

fixup lints

cf5829d

theobarberbany force-pushed the manifests-gen branch from 80d3621 to ab78f98 Compare February 5, 2026 15:42

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

revendor

b868525

theobarberbany force-pushed the manifests-gen branch from e80de4b to b868525 Compare February 5, 2026 16:55

Adds cluster-api to image-reerences / configmap

932ab21

We need to pull in the cluster-api image as that's where the core CAPI CRDs live

Conversation

mdbooth commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Dec 16, 2025

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

openshift-ci bot commented Dec 16, 2025

Uh oh!

openshift-ci bot commented Dec 16, 2025

Uh oh!

openshift-ci-robot commented Dec 16, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdbooth commented Dec 16, 2025

Uh oh!

mdbooth commented Dec 16, 2025

Uh oh!

mdbooth commented Dec 17, 2025

Uh oh!

mdbooth commented Dec 17, 2025

Uh oh!

mdbooth commented Dec 19, 2025

Uh oh!

mdbooth commented Dec 19, 2025

Uh oh!

mdbooth commented Dec 19, 2025

Uh oh!

mdbooth commented Dec 19, 2025

Uh oh!

mdbooth commented Dec 19, 2025

Uh oh!

mdbooth commented Dec 21, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 3, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

theobarberbany commented Feb 4, 2026

Uh oh!

theobarberbany commented Feb 4, 2026

Uh oh!

openshift-ci-robot commented Feb 4, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

mdbooth left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Feb 4, 2026

Uh oh!

openshift-ci-robot commented Feb 5, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

theobarberbany commented Feb 5, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 5, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

mdbooth commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 16, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 3, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 4, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 5, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 5, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 5, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Feb 5, 2026 •

edited by openshift-ci bot

Loading