-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prr: start of pilot policy doc #4181
Conversation
@@ -0,0 +1,51 @@ | |||
# Production Readiness Review Process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would move it to sig-architecture/
[there is already api-review process doc there]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's already there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not: contributors/devel/sig-architecture, just simply sig-architecture
So basically here:
https://github.com/kubernetes/community/tree/master/sig-architecture
# Production Readiness Review Process | ||
|
||
Production readiness reviews are intended to ensure that features merging into | ||
Kubernetes are observable and supportable, can be safely operated in production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and scalable ?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnbelamaric The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@johnbelamaric - please squash the commits and I will LGTM. I would like to merge quick and iterate - the doc already makes it clear that it's "under development" and not fully figured out. |
b867d56
to
f79dd3e
Compare
squashed |
Let's merge and iterated. /lgtm |
## Questionnaire | ||
|
||
* Feature enablement and rollback | ||
- How can this feature be enabled / disabled in a live cluster? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be clarified to be a live-HA cluster.
of a node? | ||
- What happens if a cluster with this feature enabled is rolled back? What | ||
happens if it is subsequently upgraded again? | ||
- Are there tests for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify "this". I suspect you mean, "are there tests for a disable, enable, disable, enable cycle", but you could also mean "upgrade, downgrade, upgrade" which seems pretty onerous at the moment.
* Dependencies | ||
- Does this feature depend on any specific services running in the cluster | ||
(e.g., a metrics service)? | ||
- How does this feature respond to complete failures of the services on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be slightly more prescriptive here. "how would a cluster-admin know that this feature is failing because a particular service is degraded" It could be two questions, but when I'm deploying, I want to know how to tell it's failing.
- How does this feature respond to degraded performance or high error rates | ||
from services on which it depends? | ||
* Monitoring requirements | ||
- How can an operator determine if the feature is in use by workloads? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we specifically care about workloads or just "in use"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh, github.
which it depends? | ||
- How does this feature respond to degraded performance or high error rates | ||
from services on which it depends? | ||
* Monitoring requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be slightly more prescriptive here. I want to ensure that any new binary comes with a secured health, ready, and metrics endpoint.
|
||
* Feature enablement and rollback | ||
- How can this feature be enabled / disabled in a live cluster? | ||
- Can the feature be disabled once it has been enabled (i.e., can we roll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to include impact on workloads as well, distinct from the control-plane/cluster-level considerations.
Like, some workload considerations might be:
- Does this feature change the behavior or performance characteristics of workloads running on a cluster?
- Will some workloads that could run successfully on the cluster before, stop working or no longer be admissible once this feature is enabled?
- Do workloads need to be restarted to take advantage of this feature?
- How can workloads be migrated over to take advantage of this feature? Can it be selectively enabled (e.g. per-node/per-namespace, only to new workloads/objects, in a report-only or dry-run mode)? Will enabling/disabling the feature require downtime or make certain features temporarily unavailable for workloads running on the cluster?
No description provided.