Skip to content

Removal of old apiVersions in CRDs #11894

Closed
@sbueringer

Description

@sbueringer

Problem statement

As we evolve our CRDs over time we regularly bump our apiVersions. While we have an established process to add new apiVersions in Cluster API, removal of apiVersions is still problematic.

Before we can remove an apiVersion from a CRD we have to go through the following steps:

1. Ensure all custom resources can still be read from etcd after the apiVersion is removed ("Storage version migration")

This is typically done by writing all custom resources with the current storage version. More information can be found in Versions in CRDs.

In Cluster API today, this is only implemented as part of the clusterctl upgrade command. If clusterctl is not used, folks have to write their own full custom implementation or build on top of the builtin storage migration (alpha since v1.30).

Note: Storage version migration should be run as soon as a new apiVersion becomes storage version to minimize conversion webhook calls.

2. Remove the old apiVersion from managedFields of all custom resources ("ManagedField cleanup")

Kubernetes stores field ownership information by apiVersion in managedFields. Unfortunately, there is no builtin logic that removes managedFields of an apiVersion when that apiVersion is removed from the CRD. If there are still managedFields with a removed apiVersion any subsequent apply requests will fail:

Note: managedField cleanup should be run as soon as an apiVersion is not served anymore to minimize conversion webhook calls. As long as an apiVersion is still served, users can still apply with this apiVersion and then the corresponding managedFields are needed to properly execute the apply.

Why is this important now

Expand for more details

For the following reasons we want to remove old apiVersions as soon as possible:

Maintenance effort

We have to keep the Go types of the old apiVersions around. We also have to adjust the conversion implementation whenever we add a new field to our current API.

Increased resource usage through conversion requests

As long as the old apiVersions are part of our CRDs we will get a significant number of requests on the conversion webhooks:

Implementation

A few notes:

  • The solution should be also available for folks that don't use clusterctl
  • It should be easy for providers to re-use the implementation for their own CRDs
  • It should be possible to disable storage version migration and/or managedField cleanup for cases where folks want to take care of these themselves

Idea:

  • Implement a controller / reconciler that can be embedded in core CAPI / providers to run storage version migration and managedField cleanup for the providers' CRDs

For more implementation details, see: #11889

Removal of v1alpha3 & v1alpha4 and v1beta1 apiVersions

Context:

  • CAPI supports 3 versions: for 2 versions we regularly release patch releases, for the 3rd one we create emergency patches on demand
  • CAPI currently tests up to n-3 => n upgrades.
  • We want to make sure that removal of old apiVersions do not break the n-3 => n upgrade path. This means that we have to keep apiVersions around long enough that "storage version migration" and "managedField cleanup" are run.

v1alpha3 & v1alpha4

CAPI Release date v1alpha3 + v1alpha4 Notes
v1.9 Dec 24 Served: false
v1.10 April 25 Served: false CRD migrator added
v1.11 August 25 Served: false
v1.12 December 25 Served: false
v1.13 April 26 v1alpha3 + v1alpha4 removed
v1.14 August 26
v1.15 December 26

Notes:

  • v1.10-v1.12: We have to keep v1alpha3 + v1alpha4 around for 3 versions after the CRD migrator has been added, to ensure that managedField cleanup is run even if someone upgrades from n-3 => n

v1beta1

CAPI Release date v1beta1 v1beta2 Notes
v1.9 Dec 24 Served: true, Storage
v1.10 April 25 Served: true, Storage
v1.11 August 25 Served: true Served: true, Storage v1beta2 added
v1.12 December 25 Served: true Served: true, Storage
v1.13 April 26 Served: true Served: true, Storage
v1.14 August 26 Served: false Served: true, Storage v1beta1 unserved
v1.15 December 26 Served: false Served: true, Storage
v1.16 April 27 Served: false Served: true, Storage
v1.17 August 27 Served: false Served: true, Storage
v1.18 December 27 Served: true, Storage v1beta1 removed

Notes:

  • v1.11-v1.13: We have to keep v1beta1 served for 3 versions after introduction of v1beta2 according to the Kubernetes deprecation policy
  • v1.14-v1.17: We have to keep v1beta1 around for 3 versions after it was unserved, to ensure that managedField cleanup is run even if someone upgrades from n-3 => n.
    • Note: We want to keep v1beta1 around for one additional release so that folks have 1 buffer release where they can revert v1beta1 back to served if they need more time to pick up v1beta2 (we did the same for v1alpha3 + v1alpha4 in the past).

Tasks:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions