Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prr: start of pilot policy doc #4181

Merged
merged 1 commit into from
Oct 19, 2019

Conversation

johnbelamaric
Copy link
Member

No description provided.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 17, 2019
@k8s-ci-robot k8s-ci-robot added area/developer-guide Issues or PRs related to the developer guide sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Oct 17, 2019
@@ -0,0 +1,51 @@
# Production Readiness Review Process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move it to sig-architecture/

[there is already api-review process doc there]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant not: contributors/devel/sig-architecture, just simply sig-architecture

So basically here:
https://github.com/kubernetes/community/tree/master/sig-architecture

# Production Readiness Review Process

Production readiness reviews are intended to ensure that features merging into
Kubernetes are observable and supportable, can be safely operated in production
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and scalable ?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wojtek-t
Copy link
Member

@johnbelamaric - please squash the commits and I will LGTM. I would like to merge quick and iterate - the doc already makes it clear that it's "under development" and not fully figured out.

@johnbelamaric
Copy link
Member Author

squashed

@wojtek-t
Copy link
Member

Let's merge and iterated.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2019
@k8s-ci-robot k8s-ci-robot merged commit eeec091 into kubernetes:master Oct 19, 2019
## Questionnaire

* Feature enablement and rollback
- How can this feature be enabled / disabled in a live cluster?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be clarified to be a live-HA cluster.

of a node?
- What happens if a cluster with this feature enabled is rolled back? What
happens if it is subsequently upgraded again?
- Are there tests for this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify "this". I suspect you mean, "are there tests for a disable, enable, disable, enable cycle", but you could also mean "upgrade, downgrade, upgrade" which seems pretty onerous at the moment.

* Dependencies
- Does this feature depend on any specific services running in the cluster
(e.g., a metrics service)?
- How does this feature respond to complete failures of the services on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be slightly more prescriptive here. "how would a cluster-admin know that this feature is failing because a particular service is degraded" It could be two questions, but when I'm deploying, I want to know how to tell it's failing.

- How does this feature respond to degraded performance or high error rates
from services on which it depends?
* Monitoring requirements
- How can an operator determine if the feature is in use by workloads?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we specifically care about workloads or just "in use"?

Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, github.

which it depends?
- How does this feature respond to degraded performance or high error rates
from services on which it depends?
* Monitoring requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be slightly more prescriptive here. I want to ensure that any new binary comes with a secured health, ready, and metrics endpoint.


* Feature enablement and rollback
- How can this feature be enabled / disabled in a live cluster?
- Can the feature be disabled once it has been enabled (i.e., can we roll

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to include impact on workloads as well, distinct from the control-plane/cluster-level considerations.
Like, some workload considerations might be:

  • Does this feature change the behavior or performance characteristics of workloads running on a cluster?
  • Will some workloads that could run successfully on the cluster before, stop working or no longer be admissible once this feature is enabled?
  • Do workloads need to be restarted to take advantage of this feature?
  • How can workloads be migrated over to take advantage of this feature? Can it be selectively enabled (e.g. per-node/per-namespace, only to new workloads/objects, in a report-only or dry-run mode)? Will enabling/disabling the feature require downtime or make certain features temporarily unavailable for workloads running on the cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/developer-guide Issues or PRs related to the developer guide cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants