Skip to content

Conversation

salasberryfin
Copy link
Contributor

@salasberryfin salasberryfin commented Jul 23, 2025

What this PR does / why we need it:

This PR adds logic to patch objects defined in a provider's manifest leveraging a new CAPI Operator functionality that allows passing a custom alter components function which is executed during the Fetch phase.

There are two core changes. A new feature gate no-cert-manager is added, and this logic is only applied if set to true (it is an opt-in action, defaults to false):

Let wrangler manage certificates

This is implemented in function patchProviderManifestFn, which retrieves the name of the secret from the Certificate resource and annotates the webhook service (e.g. capi-webhook-service with:
need-a-cert.cattle.io/secret-name: <secretName>

Remove cert-manager

This is implemented in function removeCertManagerFn:

  • Removes annotation cert-manager.io/inject-ca-from from any resource in the manifest.
  • Deletes resources Certificate and Issuer from the list of objects.

This step runs on provider installation (or update), as modifying the object manifest after is has already been deployed does not have effect on the existing resources. This is because of the reconciliation phases and the manifest being stored in memory.

Which issue(s) this PR fixes:
Fixes #1513

Special notes for your reviewer:

Checklist:

  • squashed commits into logical changes
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

@salasberryfin salasberryfin added kind/refactor Indicates a PR with large refactoring changes e.g. removes files or moves content area/operator labels Jul 23, 2025
@salasberryfin salasberryfin force-pushed the certficates-managed-by-wrangler branch from 02e887b to 86512ac Compare July 30, 2025 06:54
@salasberryfin salasberryfin force-pushed the certficates-managed-by-wrangler branch 9 times, most recently from ce918fd to 5a1ab33 Compare August 20, 2025 15:32
Signed-off-by: Carlos Salas <carlos.salas@suse.com>
Signed-off-by: Carlos Salas <carlos.salas@suse.com>
@salasberryfin salasberryfin force-pushed the certficates-managed-by-wrangler branch 3 times, most recently from 5b4a819 to 1a24f25 Compare August 21, 2025 10:16
patchProviderManifestFn,
}

if feature.Gates.Enabled(feature.UninstallCertManager) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the uninstall cert-manager feature to be misleading. This is not uninstalling cert-manager, it's just removing the dependency from the provider manifest. So:

  1. Both new phases (conversion to wrangler && cert-manager dependency removal) have to be gated behind the same feature. It's not going to work if we just add the wrangler notation, it will conflict with the existing cert-manager Certificate most likely.

  2. uninstall-cert-manager feature should be renamed imho, I think something like cert-manager-conversion could be more appropriate.

  3. I do miss a phase where the provider deployment is restarted, otherwise the change won't be effective as the controller pod is still using the old/replaced certificate.

Copy link
Contributor

@anmazzotti anmazzotti Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding actually uninstalling cert-manager, I think this simply needs to be left to the user.
Turtles does not know how cert-manager was installed, since it's currently a prerequisite, so the user has to take care of that, once all provider manifests have been converted.

Edit: For this reason it would be nice to have a new CAPIProvider condition to mark that conversion happened successfully, so the user knows when cert-manager is no longer in-use and needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename uninstall-cert-manager to no-cert-manager?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also agree that both functions have to be under the same feature gate, the choice is either or another, no options in between.

Copy link
Contributor Author

@salasberryfin salasberryfin Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On point 3 on @anmazzotti's list, I'm not sure the deployment needs to be restarted, since the provider manifest is updated during Fetch, and before Install, which means that there's no controller running before any of the changes are applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an update and not a first install, then the provider controller is running and (assuming we didn't bump the provider version), there's nothing updated that will make the Deployment rollout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I've seen is using helm upgrade to only update the value of the new feature gate (from false to true), triggers a re-creation of provider's controllers and resources. Shouldn't this suffice to consider it re-deployed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the new feature a Turtles feature? It has nothing to do with CAPIProviders, or is the new feature meant to be per-provider? Sorry I'm confused now.

Just to clarify, the "converted" CAPI provider Deployment needs a rollout. This is what I'm saying. Not Turtles (which does not have webhooks and need no conversion as far I know). I mean the CAPI providers like caprke2, capa, etc. Their respective pods need a rollout, since they will still use the cert-manager Certificate.

I mean they will still work as it's a self signed cert anyway, but it would be best to make the conversion effective immediately with a rollout, rather than waiting for troubles to come unexpectedly at the next rollout, like provider update.

@salasberryfin salasberryfin force-pushed the certficates-managed-by-wrangler branch from 1a24f25 to 748db18 Compare August 26, 2025 14:02
Signed-off-by: Carlos Salas <carlos.salas@suse.com>
@salasberryfin salasberryfin force-pushed the certficates-managed-by-wrangler branch from 748db18 to 5f2f5ae Compare August 28, 2025 06:54
Signed-off-by: Carlos Salas <carlos.salas@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator kind/refactor Indicates a PR with large refactoring changes e.g. removes files or moves content
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Patch upstream manifests to replace cert-manager with wrangler
3 participants