Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features dev and support guidelines doc #14428

Merged
merged 3 commits into from
Sep 15, 2022
Merged

Conversation

spzala
Copy link
Member

@spzala spzala commented Sep 6, 2022

Related #13775

Signed-off-by: Sahdev Zala spzala@us.ibm.com

@spzala
Copy link
Member Author

spzala commented Sep 6, 2022

cc @serathius @ahrtr

### Graduation to Stable

It is important that experimental features don't get stuck in that stage. An experimental feature should move to the stable stage by using the criteria provided below:
- The feature was enabled for at least one major release or one year, whatever comes first. This time frame should be enough to find any potential bugs in the feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably one minor release makes more sense here. For example, assuming we add an experimental feature --experimental-abc in 3.5, and it's disabled by default in 3.5. It may be enabled by default in 3.6. Eventually we can graduate it in 3.7 if no any major issue is opened against the feature.

If it's major release, then it means we can't graduate the feature until 5.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr thanks and yes agree!! That's how I had in mind too :) but mixed up. I will update it.

### Feature Depreciation Process

As the project evolves, sometimes a stable feature may need to be deprecated and removed. Such a situation should be handled using the steps below:
- Provide a depreciation message in the feature usage document before a planned major release for feature depreciation. e.g. `To be deprecated in <release>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, Probably minor release makes more sense here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks @ahrtr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr I think this can be both minor or minor+major, let me update changes and see if that make more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@codecov-commenter
Copy link

codecov-commenter commented Sep 6, 2022

Codecov Report

Merging #14428 (a74cb15) into main (73029fe) will decrease coverage by 0.24%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #14428      +/-   ##
==========================================
- Coverage   75.45%   75.20%   -0.25%     
==========================================
  Files         457      457              
  Lines       37195    37195              
==========================================
- Hits        28064    27973      -91     
- Misses       7377     7447      +70     
- Partials     1754     1775      +21     
Flag Coverage Δ
all 75.20% <ø> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/proxy/grpcproxy/register.go 69.76% <0.00%> (-9.31%) ⬇️
server/auth/simple_token.go 80.00% <0.00%> (-8.47%) ⬇️
client/pkg/v3/fileutil/lock_linux.go 72.22% <0.00%> (-8.34%) ⬇️
server/lease/lease.go 94.87% <0.00%> (-5.13%) ⬇️
server/storage/wal/file_pipeline.go 90.69% <0.00%> (-4.66%) ⬇️
client/v3/leasing/cache.go 87.77% <0.00%> (-3.89%) ⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go 69.56% <0.00%> (-3.48%) ⬇️
client/v3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 74.47% <0.00%> (-3.13%) ⬇️
server/proxy/grpcproxy/watch.go 93.64% <0.00%> (-2.90%) ⬇️
... and 16 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- A new feature added in the latest release and disabled by default e.g. etcd v3.5 or v4.0
- Given there are no open issues against the feature,
- It is enabled by default in the next release e.g. etcd v3.6 or v4.1
- It move to stable stage in the following release e.g. etcd v3.7 or v4.2. If release cycle takes longer, one year time frame can be used to move to the stable stage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"move" --> "moves"?

- A new feature added in the latest release and disabled by default e.g. etcd v3.5 or v4.0
- Given there are no open issues against the feature,
- It is enabled by default in the next release e.g. etcd v3.6 or v4.1
- It move to stable stage in the following release e.g. etcd v3.7 or v4.2. If release cycle takes longer, one year time frame can be used to move to the stable stage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree with the two points here. But I think two minor releases is just a minimum requirement. It means it isn't necessarily to graduate an experimental feature after two minor releases (e.g., added in 3.5, and graduate in 3.7), although usually it does.

The point is we need to have some flexibility on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr I agree that flexibility is needed. How about re-write as below for clarity that the next action is based on the review?:

  • Given there are no open issues against the feature, it is ready for review to,
    • Enable by default in the next release e.g. etcd v3.6 or v4.1
    • Move to the stable stage in the following release e.g. etcd v3.7 or v4.2. If release cycle takes longer, one year time frame can be used for the review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ready for review makes sense, since we have flexibility to approve or reject the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks @ahrtr

Related etcd-io#13775

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>

It is important that experimental features don't get stuck in that stage. An experimental feature should move to the stable stage by following the lifecycle steps below:
- A new feature added in the latest release and disabled by default e.g. etcd v3.5 or v4.0
- Given there are no open issues against the feature, it is ready for review to,
Copy link
Member

@serathius serathius Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have more strict requirements for graduating feature than just no open issues.

Things that should be considered when graduating (not all are required depending on feature) :

  • Logs allowing for proper debugging
  • Documentation
  • Metrics
  • Benchmarks
  • Proper robust E2e tests
  • upgrade/downgrade tests if there are changes in api, raft, wal protos

Graduation issue should have those requirements as a checklist and be approved by at least 2 maintainers that are willing to support this feature.

- Any variable names related to the implementation of the feature are prefixed with `Experimental` e.g. `ExperimentalFeatureName`.
- Might be buggy. Enabling the feature may not work as expected.
- Support for such a feature may be dropped at any time without notice.
- Disabled by default when added initially.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we describe minimal requirements for feature to be accepted into etcd as experimental. Fact that issue is experimental doesn't mean that we accept any low quality code.

Proposed

  • Unit tested
  • One E2e test for basic scenario

Even though etcd can reserve right to stop supporting the feature and remove it in next release, we should still ensure that we don't unneseserly bloat codebase. I think accepting features should also require support from at least 2 maintainers .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, I will make them more explicit. Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another proposal:
Limit number of experimental features. If limit is reached, author of a new feature has to deprecate/graduate existing feature.
It might sound too much, but otherwise this work will fall onto maintainers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lavacat but I think we cannot limit the numbers as adding a new feature is need based. I will be opening a separate issue for future approach and we can discuss it there as needed.

- Support for such a feature may be dropped at any time without notice.
- Disabled by default when added initially.

### Graduation to Stable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that during graduation we will remove experimental prefix from feature flag. This would be a breaking change in flags that could be easily mitigated by not renaming old experimental flag, but by marking it as hidden and adding new one. There should not be a difference in behavior but it should bring big improvements to user experience during upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking can we follow the same approach as Kubernetes to manage the feature?

Copy link
Member

@serathius serathius Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature gates are good when you don't need to do any configuration for feature to work. On multiple occasions I have added a new feature to Kubernetes and decided against feature gating.

What I am proposing is more inline with https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli. From CLI perspective graduating a feature is a breaking change without any deprecation. We remove experimental flag and add a new one. During upgrade admins need to take a immediate action and update all graduated flags.

I'm proposing that when feature graduates we still leave experimental flags for one release, giving users time to adapt. Main benefit is that we vastly improve user experience without doing any work.

Alternative approaches:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I am thinking.

  • Adding/removing the prefix experimental is indeed a breaking change;
  • Keeping experimental flags for one more release complicates the feature graduation process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good points both about breaking changes related to prefix, thanks. May be simple way would be to treat experimental features naming/usage in a similar way to unsafe features (I missed it in the stages earlier by we have stable, experimental and unsafe features currently). I will be pinging you for quick discussion around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 3 separate things to address:

  • Have documentation reflect the current state (usage of experimental prefix)
  • Avoiding breaking changes in flags (not immidietly removing experimental flags upon graduation)
  • Long term approach for handling feature flags that avoid problem of flag compatibility altogether.

In this PR we should address first two points. @spzala feel free to open an issue with proposal for the third one.

Addressed feedback with some added thoughts. Also, added
Unsafe features.

Related etcd-io#13775

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
- Any configuration flags related to the implementation of the feature are prefixed with `experimental` e.g. `--experimental-feature-name`
- Any variable names related to the implementation of the feature are prefixed with `Experimental` e.g. `ExperimentalFeatureName`
- Provided with unit tests
- Provided with e2e test coverage for at least the basic scenario
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e or integration test

Copy link
Member

@serathius serathius Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new features that expose new user facing elements (command line flags), I would always prefer e2e tests over integration. I would say that integration tests should be recommended but never replace e2e tests.


It is important that experimental features don't get stuck in that stage. After a release of an intiial enablement (e.g. etcd v3.5), an experimental feature should be revisited in the following release (e.g. etcd v3.6 or after six months if release takes longer) and see if it's ready for graduation to the stable stage in the future release (e.g. etcd v3.7 or v4.0). If it's ready per discussions with the project maintainers, then depending on the scope of the feature, at least the following enhancements should be consiered to be worked on by opening a PR(s) and new issue(s) for tracking as needed,
- Fix any open issues against the feature
- Add robust e2e tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e or integration tests

- Modify the feature documentation and add a CHANGELOG entry to reflect that `--experimental` prefix in the feature name is expected to be dropped in the next release e.g. `Expected to be graduated and renamed to --feature-name in the future release`
- Create a new issue for future graduation with reference to the PR discussed here. Add the release label of the upcoming release for tracking.

One release with above enhancements should give a good idea on the feature's stability. Unless there are open issues that can prevent feature gradution, the feature can move to the stable stage in the next release with a PR on feature rename, doc updates, and an entry in the CHANGELOG. All the related PRs and discussions should be approved by at least two project maintainers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: graduation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah :) thanks @ahrtr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as you and @serathius will notice and as I mentioned in the commit message, I have addressed feedback with some added thoughts. I think by the time, we graduate a feature to stable it should be treated as a stable feature for a release (something that I added here). And also a warning message on possible name change (i.e. removing prefix) for a release should serve our purpose on giving user heads up (something that's general done with deprecation). Thanks!

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @spzala

Although removing the --experimental may have some impact on the client application when graduating a feature, but it is consistent with the existing behavior, and it explicitly requires users' awareness and action so as to avoid any unexpected surprise.

We can think about and enhance it later if needed.

- Update metrics and benchmarks if needed
- Upgrade/downgrade tests specially if there are changes in API, Raft, WAL protos.
- Enable by default if appropriate
- Modify the feature documentation and add a CHANGELOG entry to reflect that `--experimental` prefix in the feature name is expected to be dropped in the next release e.g. `Expected to be graduated and renamed to --feature-name in the future release`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we'd better to clearly state that we will make both the experimental flag and graduated flag co-exist for at least one minor release. For example, when we decide to graduate a experimental feature in 3.6, we add a new flag --feature-name, and do not remove the --experimental-feature-name until next minor, e.g. 3.7 (or major 4.0 if no next minor release) release.

Experimental feature related development work is expected to be high quality, similar to any development work for the project. A related PR should meet the following criteria,
- Associated with an issue providing a clear need for such a feature
- Any configuration flags related to the implementation of the feature are prefixed with `experimental` e.g. `--experimental-feature-name`
- Any variable names related to the implementation of the feature are prefixed with `Experimental` e.g. `ExperimentalFeatureName`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only holds true for external configuration in embed.Server and is not propagated deeper. I think it is a bit of overkill.


### Graduation to Stable

It is important that experimental features don't get stuck in that stage. After a release of an intiial enablement (e.g. etcd v3.5), an experimental feature should be revisited in the following release (e.g. etcd v3.6 or after six months if release takes longer) and see if it's ready for graduation to the stable stage in the future release (e.g. etcd v3.7 or v4.0). If it's ready per discussions with the project maintainers, then depending on the scope of the feature, at least the following enhancements should be consiered to be worked on by opening a PR(s) and new issue(s) for tracking as needed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is important that experimental features don't get stuck in that stage. After a release of an intiial enablement (e.g. etcd v3.5), an experimental feature should be revisited in the following release (e.g. etcd v3.6 or after six months if release takes longer) and see if it's ready for graduation to the stable stage in the future release (e.g. etcd v3.7 or v4.0). If it's ready per discussions with the project maintainers, then depending on the scope of the feature, at least the following enhancements should be consiered to be worked on by opening a PR(s) and new issue(s) for tracking as needed,
It is important that experimental features don't get stuck in that stage. After a release of an initial enablement (e.g. etcd v3.5), an experimental feature should be revisited in the following release (e.g. etcd v3.6 or after six months if release takes longer) and see if it's ready for graduation to the stable stage in the future release (e.g. etcd v3.7 or v4.0). If it's ready per discussions with the project maintainers, then depending on the scope of the feature, at least the following enhancements should be consiered to be worked on by opening a PR(s) and new issue(s) for tracking as needed,

- Modify the feature documentation and add a CHANGELOG entry to reflect that `--experimental` prefix in the feature name is expected to be dropped in the next release e.g. `Expected to be graduated and renamed to --feature-name in the future release`
- Create a new issue for future graduation with reference to the PR discussed here. Add the release label of the upcoming release for tracking.

One release with above enhancements should give a good idea on the feature's stability. Unless there are open issues that can prevent feature gradution, the feature can move to the stable stage in the next release with a PR on feature rename, doc updates, and an entry in the CHANGELOG. All the related PRs and discussions should be approved by at least two project maintainers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on Unless there are open issues that can prevent feature graduation: We don't have a good way to track issues for each feature. Instead of scattering graduation requirements into multiple issues that we need to handle, we should keep single issue for feature graduation. In the top comment there should be a checklist for each requirement with link to PR/issue that implements this.

I would change this point to say that graduating a feature should require creating an issue with list of requirements that tracks the proces. When enough requirements are fulfilled and issue get approval by 2 maintainers, then the feature can be graduated by making proper rename/documentation/changelog updates.

@serathius
Copy link
Member

I think that adding this would really help to clarify some of unwritten processes, however it's unclear for me who is the audience for this document. It describes both internal developer graduation proces and user facing support guarantees. I think it would be much clearer if we split it into two separate documents based on audience:

  • User facing documentation describing feature maturity levels (experimental, graduated, deprecated).
  • Developer facing description of process to move features between those levels (how to introduce new feature to experimental, what is needed to graduate or deprecate a feature).

Most important for now is to maintain and improve quality. All features should have proper test, documentation and metrics. I think we can postpone the detailing the support and deprecation policies for user facing documentation. I would skip description of experimental/mature features and instead describe the process we expect developers should go through.

@lavacat
Copy link

lavacat commented Sep 13, 2022

To provide some context, I've listed all experimental features across 3.3, 3.4, 3.5 and main branches.
https://docs.google.com/spreadsheets/d/1_HleY81w5AHqIA857O6SlSzzWtPjCvHT1ZkwWUOr88Q/edit#gid=0

There was only 1 experimental feature that was graduated experimental-backend-bbolt-freelist-type and 1 removed experimental-enable-v2v3

- Fix any open issues against the feature
- Add robust e2e tests
- Enhance logs for proper debugging
- Update metrics and benchmarks if needed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe robust tests, logs, metrics and benchmarks should be part of minimum dev requirements. Otherwise this creates an expectation that somebody will pick up this work during 'graduation' phase. This will slow down graduation process. Also original author probably has the best context to add those tests/logs.

For example experimental-compaction-batch-limit oss-study/etcd-3.4.9@1e213b7#diff-0abf7361537ed66a5b8d85cdc518a4ab2f693a1e8c929c93de3746851a5ddfbe

Technically it's just exposing hardcoded value. But to graduate it, do we need more tests/benchmarks?

@spzala
Copy link
Member Author

spzala commented Sep 14, 2022

To provide some context, I've listed all experimental features across 3.3, 3.4, 3.5 and main branches. https://docs.google.com/spreadsheets/d/1_HleY81w5AHqIA857O6SlSzzWtPjCvHT1ZkwWUOr88Q/edit#gid=0

There was only 1 experimental feature that was graduated experimental-backend-bbolt-freelist-type and 1 removed experimental-enable-v2v3

Thanks for the doc! Going forward our goal would be not to keep experimental features in that stage forever but revisit them and see if they can be graduated to stable stage or be removed as needed.

Add an overview and initial development guidelines. Restructured
the doc for a better readabiltiy and easier review, and per the
previous review feedback. The TODOs will be addressed iteratively.

Related etcd-io#13775

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>

### Graduating an Experimental feature to Stable (TODO - spzala)

### Deprecating a feature (TODO - spzala)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serathius I have restructured the doc for a better readability and easier review, and also tried addressing the review feedback. I will be addressing two TODOs separately/iteratively building on top of this initial overview and guidelines. I think that way it will be easier to review and speed up finalizing the content. Also, once developer doc is done, I will be adding user doc. Please let me know your thoughts. Thanks! cc @ahrtr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, short and precise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @serathius

@spzala spzala merged commit 2fa7302 into etcd-io:main Sep 15, 2022
spzala added a commit to spzala/etcd that referenced this pull request Sep 17, 2022
Related etcd-io#14428

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
spzala added a commit to spzala/etcd that referenced this pull request Sep 19, 2022
Related etcd-io#14428

Signed-off-by: Sahdev Zala <spzala@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants