Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic schema rfc #368

Merged
merged 8 commits into from
Jun 29, 2024
Merged

Conversation

jakefhyde
Copy link
Contributor

@jakefhyde jakefhyde commented May 6, 2024

Issue: rancher/rancher#44618

Problem

An issue was identified (rancher/rancher#40609), where updating the node driver and changing the flags would result in CAPI detecting the supplied fields do not match the existing fields on the machine infrastructure custom resource. The fix for this issue was to persist the dynamicSchema onto the cluster object under the machinePools section, and stripping fields not present at the time of initial provisioning from the related CR, as the alternative of properly versioning CRDs for each node driver was not and continues to not be a viable solution. While this change initially fixed the issue of unintended cluster reprovisioning on node driver update, other cluster update solutions such as terraform and kubectl are still susceptible to this issue as an update unintentionally removing the dynamicSchemaSpec fields will still result in reprovisioning if new fields (whether used or not) are added to a node driver. Since the dynamicSchemaSpec is added by a controller within Rancher, a cluster create request followed by updating the node driver and a subsequent update request (identical to the cluster create request) will cause reprovisioning, due to the dynamicSchemaSpec being unpopulated within the request.

Solution

Add an annotation which the webhook will use to determine whether the removal of the dynamicSchemaSpec is intentional. If not present or ”false”, the dynamicSchemaSpec will be reinserted into the cluster object as-is, preventing an unintentional cluster-wide machine rollout. If true, the fields will stay removed from the request, and allow provisioning with the previous behavior, allowing users to opt-in for a potential rollout if the node driver was indeed changed.

CheckList

  • Test
  • Docs

There isn't currently an existing markdown file for provisioning clusters. I'm happy to create one in a follow up PR if necessary.

@jakefhyde jakefhyde changed the title Dynamic schema rfc data-dir rfc Jun 19, 2024
@jakefhyde jakefhyde changed the title data-dir rfc Dynamic schema rfc Jun 19, 2024
@jakefhyde jakefhyde force-pushed the dynamic-schema-rfc branch 2 times, most recently from b0d9d0a to 21692b9 Compare June 19, 2024 03:03
@jakefhyde jakefhyde requested review from Oats87, snasovich and a team June 19, 2024 03:09
@jakefhyde jakefhyde marked this pull request as ready for review June 19, 2024 03:09
@jakefhyde jakefhyde requested a review from a team as a code owner June 19, 2024 03:09
@jakefhyde jakefhyde force-pushed the dynamic-schema-rfc branch 3 times, most recently from 98ec3e7 to 014357f Compare June 21, 2024 15:47
go.mod Outdated
Comment on lines 21 to 22
k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.29.3
k8s.io/endpointslice => k8s.io/endpointslice v0.29.3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellisense fails without doing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a heads up, we recently merged a PR for k8s 1.30 support - #412. If you could bump these to the version used in the PR I would appreciate it v0.30.1)

Oats87
Oats87 previously approved these changes Jun 21, 2024
// dynamicSchemaSpec field for a machine pool if the "provisioning.cattle.io/allow-dynamic-schema-drop" annotation is
// not present and true on the cluster. If the value of the annotation is true, no mutation is performed.
func (m *ProvisioningClusterMutator) handleDynamicSchemaDrop(request *admission.Request, cluster *v1.Cluster) (*admissionv1.AdmissionResponse, error) {
if cluster.Name == "local" || cluster.Spec.RKEConfig == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure the harvester team isn't somehow relying on dynamic schema for the local cluster in rancherd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we are leveraging dynamic schema on local cluster for anything during the harvester bootstrap / lifecycle management. @bk201 @Vicente-Cheng

Copy link
Contributor Author

@jakefhyde jakefhyde Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good then, as long as harvester isn't worried about the dynamic schema changing with new flags and causing reprovisioning for the local cluster (not sure if that's something that's been encountered previously, or if its even possible for rancherd to bootstrap the local cluster using harvester).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankYang0529 Please help take a look too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use MachinePools field in RKEConfig, but we use RotateCertificates. Not sure whether it's related? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankYang0529 Yeah, not realted to any of the day 2 ops, only when new fields are added to a driver.

pkg/resources/provisioning.cattle.io/v1/cluster/mutator.go Outdated Show resolved Hide resolved
Oats87
Oats87 previously approved these changes Jun 27, 2024
Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant bug, and a few nits.

Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small comments - one on an incorrect order of old/new object, one on log level.

Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please squash when merging.

@jakefhyde jakefhyde merged commit 276b2b9 into rancher:release/v0.5 Jun 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants