add new heuristic to topology routing

aojea · aojea · commit 3ff6294aabfe · 2023-05-15T13:16:07.000Z
diff --git a/keps/sig-network/2433-topology-aware-hints/README.md b/keps/sig-network/2433-topology-aware-hints/README.md
@@ -16,13 +16,16 @@
   - [Kube-Proxy](#kube-proxy)
   - [EndpointSlice Controller](#endpointslice-controller)
 - [Heuristics](#heuristics)
-  - [Proportional CPU Heuristic](#proportional-cpu-heuristic)
-    - [Assumptions](#assumptions)
-    - [Identifying Zones](#identifying-zones)
+  - [Identifying Zones](#identifying-zones)
     - [Excluding Control Plane Nodes](#excluding-control-plane-nodes)
-    - [Example](#example)
     - [Overload](#overload)
     - [Handling Node Updates](#handling-node-updates)
+  - [Proportional CPU Heuristic](#proportional-cpu-heuristic)
+    - [Assumptions](#assumptions)
+    - [Example](#example)
+  - [Proportional Pod Heuristic](#proportional-pod-heuristic)
+    - [Assumptions](#assumptions-1)
+    - [Example](#example-1)
   - [Additional Heuristics](#additional-heuristics)
   - [Future Expansion](#future-expansion)
   - [Test Plan](#test-plan)
@@ -307,26 +310,14 @@ This KEP starts with the following heuristics:
 |-|-|
 | Auto | EndpointSlice controller and/or underlying dataplane can choose the heuristic used. |
 | ProportionalByCore | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
+| ProportionalByPod | Endpoints will be allocated to each zone proportionally, based on the destination Pods in each zone. |
 
 In the future, additional heuristics may be added. Until that point, "Auto" will
 be the only configurable value. In most clusters, that will translate to
 `ProportionalByCore` unless the underlying dataplane has a better approach
 available.
 
-### Proportional CPU Heuristic
-#### Assumptions
-
-- Incoming traffic is proportional to the number of allocatable CPU cores in a
-  zone. Although this is an imperfect metric, it is the best available way of
-  predicting how much traffic will be received in a zone. If we are unable to
-  derive the number of allocatable cores in a zone we will fall back to the
-  number of nodes in that zone.
-- Service capacity is proportional to the number of endpoints in a zone. This
-  assumes that each endpoint has equivalent capacity. Although this is not
-  always true, it usually is. We can explore ways to deal with variable capacity
-  endpoints in the future.
-
-#### Identifying Zones
+### Identifying Zones
 
 The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
 label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
@@ -340,23 +331,6 @@ calculating allocatable cores in a zone:
 * `node-role.kubernetes.io/control-plane`
 * `node-role.kubernetes.io/master`
 
-#### Example
-
-zone-a: 20 CPU cores
-zone-b: 16 CPU cores
-zone-c: 14 CPU cores
-
-In this scenario, the following proportion of endpoints would be allocated for
-each Service:
-
-zone-a: 40%
-zone-b: 32%
-zone-c: 28%
-
-When allocating endpoints to meet this distribution, keeping endpoints in the
-same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
-will be taken from zones that have excess capacity.
-
 #### Overload
 
 Overload is a key concept for this proposal. This occurs when there are less
@@ -393,6 +367,61 @@ of the following scenarios:
 2. A new Node results in a Service that is able to achieve an endpoint
    distribution below 20% for the first time.
 
+### Proportional CPU Heuristic
+
+#### Assumptions
+
+- Incoming traffic is proportional to the number of allocatable CPU cores in a
+  zone. Although this is an imperfect metric, it is the best available way of
+  predicting how much traffic will be received in a zone. If we are unable to
+  derive the number of allocatable cores in a zone we will fall back to the
+  number of nodes in that zone.
+- Service capacity is proportional to the number of endpoints in a zone. This
+  assumes that each endpoint has equivalent capacity. Although this is not
+  always true, it usually is. We can explore ways to deal with variable capacity
+  endpoints in the future.
+#### Example
+
+zone-a: 20 CPU cores
+zone-b: 16 CPU cores
+zone-c: 14 CPU cores
+
+In this scenario, the following proportion of endpoints would be allocated for
+each Service:
+
+zone-a: 40%
+zone-b: 32%
+zone-c: 28%
+
+When allocating endpoints to meet this distribution, keeping endpoints in the
+same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
+will be taken from zones that have excess capacity.
+
+### Proportional Pod Heuristic
+
+#### Assumptions
+
+- Incoming traffic is the same for all zones.
+- Service capacity is proportional to the number of endpoints in a zone. This
+  assumes that each endpoint has equivalent capacity.
+
+This avoids the problem of blackholing traffic in case some zone run out of endpoints,
+but is still an imperfect method to achieve perfect traffic distribution.
+
+#### Example
+
+zone-a: 2 endpoints
+zone-b: 1 endpoint
+zone-c: 3 endpoints
+
+In this scenario, each zone will server the same number of endpoints:
+
+Total endpoints / number of zones = 6 / 3 = 2 endpoints per zone
+
+When allocating endpoints to meet this distribution, keeping endpoints in the
+same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
+will be taken from zones that have excess capacity.
+
 ### Additional Heuristics
 To enable additional heuristics to be added in the future, we will: