Skip to content

Commit

Permalink
Add KEP 1659 - standard topology labels
Browse files Browse the repository at this point in the history
  • Loading branch information
thockin committed Apr 6, 2020
1 parent 1bad2ec commit b21c6a2
Show file tree
Hide file tree
Showing 2 changed files with 314 additions and 0 deletions.
298 changes: 298 additions & 0 deletions keps/sig-architecture/1659-standard-topology-labels/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
<!--
**Note:** When your KEP is complete, all of these comment blocks should be removed.
To get started with this template:
- [X] **Pick a hosting SIG.**
Make sure that the problem space is something the SIG is interested in taking
up. KEPs should not be checked in without a sponsoring SIG.
- [ ] **Create an issue in kubernetes/enhancements**
When filing an enhancement tracking issue, please ensure to complete all
fields in that template. One of the fields asks for a link to the KEP. You
can leave that blank until this KEP is filed, and then go back to the
enhancement and add the link.
- [X] **Make a copy of this template directory.**
Copy this template into the owning SIG's directory and name it
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
leading-zero padding) assigned to your enhancement above.
- [X] **Fill out as much of the kep.yaml file as you can.**
At minimum, you should fill in the "title", "authors", "owning-sig",
"status", and date-related fields.
- [ ] **Fill out this file as best you can.**
At minimum, you should fill in the "Summary", and "Motivation" sections.
These should be easy if you've preflighted the idea of the KEP with the
appropriate SIG(s).
- [ ] **Create a PR for this KEP.**
Assign it to people in the SIG that are sponsoring this process.
- [ ] **Merge early and iterate.**
Avoid getting hung up on specific details and instead aim to get the goals of
the KEP clarified and merged quickly. The best way to do this is to just
start with the high-level sections and fill out details incrementally in
subsequent PRs.
-->
# KEP-NNNN: Standard Topology Labels

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Reserve a label prefix](#reserve-a-label-prefix)
- [Defining the meaning of existing labels](#defining-the-meaning-of-existing-labels)
- [Defining a third key (or not)](#defining-a-third-key-or-not)
- [Followup work (or optionally part of this)](#followup-work-or-optionally-part-of-this)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

- [ ] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] KEP approvers have approved the KEP status as `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

Kubernetes has always taken the position that "topology is arbitrary", and
designs dealing with topology have had to take that into account. Even so, the
project has two commonly assumed labels - `topology.kubernetes.io/region` and
`topology.kubernetes.io/zone` - which are used in many components, generally
hard-coded and not extensible. Those labels have relatively well understood
meanings, and (so far) have been sufficient to represent what most people need.

This KEP proposes to declare those labels, and possibly one more, as "standard"
and give them more well-defined meanings and semantics. APIs that handle
topology can still handle arbitrary topology keys, but these common ones may be
handled automatically.

## Motivation

As we consider problems like cross-zone network traffic being a chargeable
resource in most public clouds, we started to build an API for topology in
Services. We tried to think through how that API would map to existing
load-balancer implementations which may already understand topology, and we
realized 3 things.

1) Cloud-ish load-balancers do not have arbitrary topology APIs and can't
easily adapt to that.
2) Other systems have standardized on two or three levels of topology (e.g. the [Envoy locality API]).
3) Nobody is really complaining about this.

In trying to simplify the way Service topology might work, we are proposing
that standardizing on a small set of well-defined topology concepts will be a
net win for the project at almost no cost to what users are actually doing with
Kubernetes.

[Envoy locality API]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/base.proto#envoy-v3-api-msg-config-core-v3-locality

### Goals

The goals of this KEP are to:
* build consensus that the two topology lables that ALREADY EXIST in Kubernetes are enough for most users
* determine whether a third level of topology is required or not
* produce short, descriptive, canonical documentation for theses labels

### Non-Goals

This KEP does NOT seek to:
* add new functionality that uses topology
* change existing functionality that uses topology
* solve the service topology problem

## Proposal

Kubernetes has always taken the position that "topology is arbitrary", and
designs dealing with topology have had to take that into account. Even so, the
project has two commonly assumed labels - `topology.kubernetes.io/region` and
`topology.kubernetes.io/zone` - which are used in many components, generally
hard-coded and not extensible. Those labels have relatively well understood
meanings, and (so far) have been sufficient to represent what most people need.

This KEP proposes to document those labels as "standard" and give them more
rigorous definitions. This also proposes that we discuss and decide whether a
third level of topology is needed and if so, define it in the same manner as
the existing labels.

The resulting definitions should be specific enough that users and implementors
understand what they mean, but not so rigid that they can not map them to the
nearest constructs available in most environments.

### Risks and Mitigations

The primary risks here are:

1) That we define these too loosely, such that users can not derive sufficient
value from their use.

2) That we define these too specifically, such that implementors can not use
them to represent natural concepts in their environents.

3) That we define these in a way that is incompatible with the ways they are
alredy being used.

4) That we preclude or design-out other uses of topology that users are using
today.

## Design Details

### Reserve a label prefix

Label prefixes allow us to group labels on common origin and meaning. We
propose to document somewhere (TBD) that the prefix "topology.kuberntes.io" is
explicitly reserved for use in defining metadata about the physical or logical
connectivity and grouping of Kubernetes nodes, and the associated behavioral
and failure properties of those groups.

This prefix is already in use. This KEP just aims to formalize it.

### Defining the meaning of existing labels

This KEP proposes to define the meaning and semantics of the following labels:

* topology.kubernetes.io/region
* topology.kubernetes.io/zone

The exact wording is TBD, but it must be specific enough to be useful to users
and loose enough to allow implementors sufficient freedom.

This will also include defining that "region" and "zone" are strictly
hierarchical ("zones" are subsets of "regions") and that zone names are unique
across regions. For example AWS documents "us-east-1a" as a zone under region
"us-east-1".

This will also define that, while labels are generally mutable, the topology
labels should be assumed immutable and that any changes to them may be ignored
by downstream consumers of topology.

<<[UNRESOLVED]>>
Should we also try to standardize "kubernetes.io/hostname" as "topology.kubernetes.io/node" ?
<<[/UNRESOLVED]>>

### Defining a third key (or not)

Some systems define topology in two levels (e.g. public clouds) and others use three
levels (e.g. Envoy adds "sub-zone"). This KEP proposes that we standardize on
two levels for now, while reserving the right to expand that to three (or more)
if and when we have strong demand.

### Followup work (or optionally part of this)

For a Pod to know its own topology today, it must be authorized to look at
Nodes. This is somewhat tedious, when we have downward-API support for labels
already, and we know that these topology labels are not likely to change at
run-time.

If we standardize topology keys, it would be reasonable to copy those
well-known keys from the Node to the Pod at startup, so Pods could extract that
information without bouncing through a Node object.

As long as "topology is arbitrary", we need more information about which keys to
copy, which makes this feature request less feasible.

### Test Plan

NOT APPLICABLE.

This KEP does not plan to change code, just documentation.

### Graduation Criteria

<!--
**Note:** *Not required until targeted at a release.*
Define graduation milestones.
These may be defined in terms of API maturity, or as something else. The KEP
should keep this high-level with a focus on what signals will be looked at to
determine graduation.
Consider the following in developing the graduation criteria for this enhancement:
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
- [Deprecation policy][deprecation-policy]
Clearly define what graduation means by either linking to the [API doc
definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning),
or by redefining what graduation means.
In general, we try to use the same stages (alpha, beta, GA), regardless how the
functionality is accessed.
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
#### Alpha -> Beta Graduation
- Gather feedback from developers and surveys
- Complete features A, B, C
- Tests are in Testgrid and linked in KEP
#### Beta -> GA Graduation
- N examples of real world usage
- N installs
- More rigorous forms of testing e.g., downgrade tests and scalability tests
- Allowing time for feedback
**Note:** Generally we also wait at least 2 releases between beta and
GA/stable, since there's no opportunity for user feedback, or even bug reports,
in back-to-back releases.
#### Removing a deprecated flag
- Announce deprecation and support policy of the existing flag
- Two versions passed since introducing the functionality which deprecates the flag (to address version skew)
- Address feedback on usage/changed behavior, provided on GitHub issues
- Deprecate the flag
**For non-optional features moving to GA, the graduation criteria must include [conformance tests].**
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
-->

### Upgrade / Downgrade Strategy

NOT APPLICABLE.

This KEP does not plan to change code, just documentation.

### Version Skew Strategy

NOT APPLICABLE.

This KEP does not plan to change code, just documentation.

## Implementation History

* 2020-03-31: First draft

## Drawbacks

Topology being arbitrary has a certain abstract elegance to it, and it forces
consumers of topology to be flexible in their designs. Moving away from that
brings risks of over-specifying and missing the mark for some users.

## Alternatives

The main alternative is status quo - topology is arbitrary. The main drivers
for abandoning this are described above under "Motivation".
16 changes: 16 additions & 0 deletions keps/sig-architecture/1659-standard-topology-labels/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
title: Standard Topology Labels
kep-number: NNNN #FIXME
authors:
- "@thockin"
owning-sig: sig-architecture
participating-sigs:
- sig-network
- sig-cloud-provider
status: provisional
creation-date: 2020-03-31
reviewers:
- TBD
approvers:
- TBD
see-also: []
replaces: []

0 comments on commit b21c6a2

Please sign in to comment.