Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Feature Gate in etcd #4662

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

siyuanfoundation
Copy link
Contributor

  • One-line PR description: Cluster Feature Gate in etcd
  • Other comments:

@siyuanfoundation siyuanfoundation marked this pull request as draft May 24, 2024 17:40
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: siyuanfoundation
Once this PR has been reviewed and has the lgtm label, please assign wenjiaswe for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2024
@k8s-ci-robot k8s-ci-robot requested a review from ahrtr May 24, 2024 17:40
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 24, 2024
@k8s-ci-robot k8s-ci-robot requested a review from jmhbnz May 24, 2024 17:40
@k8s-ci-robot k8s-ci-robot added sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 24, 2024

#### Backport Risks

To support feature gate as soon as in 3.6, we are backporting some proto changes into 3.5. There is the potential risk of changing a stable release. But we do not think this is a real risk, because it does not involve any changes other than adding a new proto field in an existing proto, and 3.5 server does not write or use this new field. Protos are inherently backward compatible. This change should also not affect upgrade/downgrade of 3.5.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also worthwhile to highlight the benefit of backporting. Since proto3 fields are optional (and unknown fields are ignored), we want to provide rationale for backporting this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually because we are introducing a new raft message type, we have to backport. Changed the paragraph.

@siyuanfoundation siyuanfoundation force-pushed the cluster-fg branch 2 times, most recently from 6933592 to ebe1a2e Compare May 28, 2024 21:55
@siyuanfoundation siyuanfoundation force-pushed the cluster-fg branch 3 times, most recently from 123689b to 16aea9b Compare June 5, 2024 22:35
@siyuanfoundation siyuanfoundation force-pushed the cluster-fg branch 4 times, most recently from 2d0a993 to 17e18ee Compare June 20, 2024 20:49
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 20, 2024
@siyuanfoundation siyuanfoundation force-pushed the cluster-fg branch 8 times, most recently from 8e8816e to 6a055b5 Compare August 7, 2024 18:49
@tengqm
Copy link

tengqm commented Sep 6, 2024

Just took a quick glance over this ... wondering what are the advantages of using feature gates when compared to command line arguments?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2024
@siyuanfoundation siyuanfoundation force-pushed the cluster-fg branch 3 times, most recently from 595624d to 85119e0 Compare December 12, 2024 18:39
Signed-off-by: Siyuan Zhang <sizhang@google.com>

Co-authored-by: Marek Siarkowicz <marek.siarkowicz@protonmail.com>
@serathius
Copy link
Contributor

@siyuanfoundation Wanna mark it ready for review?

```

### Set the Feature Gates
To set the correct user expectation, we will add a new flag `--cluster-feature-gates` (+`config-file` configuration support) similar to the server `--feature-gates` to set the values of cluster feature gates as expected by the local server. The server will use this flag value to set the `proposed_cluster_params` sent to the cluster. The final values of the cluster feature gate are actually decided through raft consensus.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not distinguish cluster feature flags in command line. It's still a feature, that is gated, the cluster part only changes it's enablement moment.


### Register New Feature Gates

A feature can be registered as server level feature or cluster level feature, but not both.
Copy link
Contributor

@serathius serathius Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to not think cluster feature and server feature as two different creations. Cluster features should just change how the feature is enabled.

For data compatibility, we do not support changing a feature gate from one type to another.
If a developer finds out that their feature needs to be a cluster level feature instead of server level after a release, they should add a new feature and use a new feature name.

NOTE: even though we have `PreRelease` information in the map, the purpose of cluster feature gate is not for lifecycle management. It is mainly for cluster level feature enablement. So a cluster feature can be `false` even after `GA`, and may not be removed if we want to keep its togglability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove mention of lifecycle management, cluster feature enablement should not care about stage of the feature, just making sure it's enabled consistently on the cluster.

One thing that might be relevant is how the process of removing cluster feature will look like. Can we never remove them and maybe solve the problem in the future?

{Version: mustParseVersion("3.7"), Default: false, PreRelease: Alpha},
},
}
DefaultEtcdClusterFeatureGates := map[Feature]VersionedSpecs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would register both cluster and server features gates in the map to track lifecycle and keep the enablement mechanism separate.


1. When a cluster starts, similar to `ClusterVersion`, the `ClusterParams` would be set to nil, which means all cluster level features are disabled, and other parameters are unknown.

1. When the there is an update of `ClusterVersion`, the `ClusterParams` would be reset to the default values at the new `ClusterVersion`, because the `ClusterParams` values at the previous `ClusterVersion` may no longer be applicable to the new version, thus invalid. The new default `ClusterParams` is set as part of the `ClusterVersionSetRequest`. At the same time, when a server applies updates of the `ClusterVersion`, it will set the `ClusterParams` to the new default `ClusterParams` sent by the leader, all the old `proposed_cluster_params` of all the members stored on the server will be reset to nil to wait for the new values to be refreshed. Each field in `ClusterParams` will have a `versionpb.etcd_version_field` annotation to indicate when it is introduced, and the default should be empty if `ClusterVersion` is smaller than the version when the filed is introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should respect flags provided during initial cluster bootstrap. If user explicitly disabled a feature because it's bugged we should not enable at all even if it is default. Based on that I don't think we should initially set ClusterParams to default based on ClusterVersion.


1. Whenever there is an update in `ClusterVersion` or the server itself, the server would propose the values of `ClusterParams` by publishing a new `Attributes.proposed_cluster_params` field [through raft](https://github.com/etcd-io/etcd/blob/e37a67e40b3f5ff8ef81f9de6e7f475f17fda32b/server/etcdserver/server.go#L1745). (see the [discussion](#push-vs-poll-when-leader-decides-cluster-feature-gate) about why we choose to push the information through raft)

1. Whenever the leader receives member attribute update from raft, it will `UpdateClusterParamsIfNeeded`: decide the values for the `ClusterParams` based on the proposed values of all members if none of the `proposed_cluster_params` is missing. The leader waits for all members' update instead of all the `proposed_cluster_params` it has received so far because we do not want the `ClusterParams` to change up to N times when a N-node cluster bootstraps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much we should be worried about cluster bootstrapping with N different feature flag configuration. On bootstrap we could assume and recommend users configure consistent configuration. I think waiting for quorum of members publishing config should be enough.


A few other alternatives we have evaluated:
1. Is it better to initialize the `ClusterParams` with nil or cluster version defaults when we have not received the updates of `proposed_cluster_params` from all members?
In either case, there would be a state change from the initial state to the final state. If we choose to use the version defaults, even though different member might have different default values of the cluster parameters, compared with if the initial state is nil, the change would still be smaller because defaults rarely change between patch versions, and most users would run with parameters close the default values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make sure to respect user provided configuration and not explicitly ignore it during cluster bootstrap. As for bootstrap, currently member doesn't serve traffic until it is able to publish it's information. Reality is that it already requires a quorum of members be available, so we could squeeze in feature flag negotiation if we delayed it to waiting for setting cluster attributes.

1. After downgrading the `ClusterVersion`, each member will send the new `proposed_cluster_params` based on local flags and the new lower `ClusterVersion`.
1. The leader will decide and update the values of the `ClusterParams` after it receives the new `proposed_cluster_params` from all members.
1. Now the `ClusterParams` and any new WAL entries should be compatible with the lower `ClusterVersion`.
1. The server will `UpdateStorageVersion`, and become ready to be downgraded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds another stage to downgrade process, reconciliation of feature flags, that needs to happen before members are allowed to downgrade. How do you plan for UpdateStorageVersion to recognize that cluster flags were reconciled?

return false
}
```
2. make the cluster not ready to serve until the actual `ClusterCluster` is established. This would ensure there is no version jump from `3.0` to `3.6` for example, and the cluster is not serving traffic when `ClusterParams == nil`. But this would break the HA assumption (meaning the cluster should be able to server when the majority of servers are online) during bootstrapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to revisit this, is there any reason we can't set clusterVersion based on quorum of available clusters instead of all needing to be up? It creates a risk that cluster version we be set higher before we have older members join, however I don't know why we would support hybrid version cluster bootstrap.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants