add networking slis for services #8142

aojea · 2024-11-07T22:43:11Z

It is important to define SLIs for Kubernetes services because network performance is a complex to measure, especially in virtualized environments like Kubernetes. Without properly defined SLIs, it can be difficult to understand how well your services are performing.

Establishing clear SLIs enables users to "compare apples with apples" and understand the true performance of their services

k8s-ci-robot · 2024-11-07T22:43:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sig-scalability/slos/OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2024-11-07T22:43:26Z

/assign @wojtek-t @npinaeva

Change-Id: Ic650446ab8d7e508c7fcf95a0db0c0b310db04a0

ArvindParekh

Just a nit :)

ArvindParekh · 2024-11-09T14:00:10Z

sig-scalability/slos/slos.md

@@ -122,6 +122,9 @@ __TODO: Cluster churn should be moved to scalability thresholds.__
 | __WIP__ | Latency of programming dns instance, measured from when service spec or list of its `Ready` pods change to when it is reflected in that dns instance, measured as 99th percentile over last 5 minutes aggregated across all dns instances | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./dns_programming_latency.md) |
 | __WIP__ | In-cluster network latency from a single prober pod, measured as latency of per second ping from that pod to "null service", measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./network_latency.md) |


Suggested change

| __WIP__ | In-cluster network latency from a single prober pod, measured as latency of per second ping from that pod to "null service", measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day[1](#footnote1) <= X | [Details](./network_latency.md) |

| __WIP__ | In-cluster network latency from a single prober pod, measured as latency of per second ping from that pod to "null service", measured as 99th percentile over last 5 minutes. | In default Kubernetes installation with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day[1](#footnote1) <= X | [Details](./network_latency.md) |

ArvindParekh · 2024-11-09T14:00:23Z

sig-scalability/slos/slos.md

@@ -122,6 +122,9 @@ __TODO: Cluster churn should be moved to scalability thresholds.__
 | __WIP__ | Latency of programming dns instance, measured from when service spec or list of its `Ready` pods change to when it is reflected in that dns instance, measured as 99th percentile over last 5 minutes aggregated across all dns instances | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./dns_programming_latency.md) |
 | __WIP__ | In-cluster network latency from a single prober pod, measured as latency of per second ping from that pod to "null service", measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./network_latency.md) |
 | __WIP__ | In-cluster dns latency from a single prober pod, measured as latency of per second DNS lookup for "null service" from that pod, measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./dns_latency.md) |


Suggested change

| __WIP__ | In-cluster dns latency from a single prober pod, measured as latency of per second DNS lookup for "null service" from that pod, measured as 99th percentile over last 5 minutes. | In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day[1](#footnote1) <= X | [Details](./dns_latency.md) |

| __WIP__ | In-cluster dns latency from a single prober pod, measured as latency of per second DNS lookup for "null service" from that pod, measured as 99th percentile over last 5 minutes. | In default Kubernetes installation with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day[1](#footnote1) <= X | [Details](./dns_latency.md) |

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 7, 2024

k8s-ci-robot requested review from marseel and wojtek-t November 7, 2024 22:43

k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 7, 2024

k8s-ci-robot assigned npinaeva and wojtek-t Nov 7, 2024

add networking slis for services

ba2b62d

Change-Id: Ic650446ab8d7e508c7fcf95a0db0c0b310db04a0

aojea force-pushed the services_slis branch from 2efa1a3 to ba2b62d Compare November 7, 2024 22:44

ArvindParekh reviewed Nov 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add networking slis for services #8142

add networking slis for services #8142

aojea commented Nov 7, 2024

k8s-ci-robot commented Nov 7, 2024

aojea commented Nov 7, 2024

ArvindParekh left a comment

ArvindParekh Nov 9, 2024

ArvindParekh Nov 9, 2024

		@@ -122,6 +122,9 @@ __TODO: Cluster churn should be moved to scalability thresholds.__
		\| __WIP__ \| Latency of programming dns instance, measured from when service spec or list of its `Ready` pods change to when it is reflected in that dns instance, measured as 99th percentile over last 5 minutes aggregated across all dns instances \| In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= X \| [Details](./dns_programming_latency.md) \|
		\| __WIP__ \| In-cluster network latency from a single prober pod, measured as latency of per second ping from that pod to "null service", measured as 99th percentile over last 5 minutes. \| In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X \| [Details](./network_latency.md) \|

	\| __WIP__ \| In-cluster dns latency from a single prober pod, measured as latency of per second DNS lookup for "null service" from that pod, measured as 99th percentile over last 5 minutes. \| In default Kubernetes installataion with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X \| [Details](./dns_latency.md) \|
	\| __WIP__ \| In-cluster dns latency from a single prober pod, measured as latency of per second DNS lookup for "null service" from that pod, measured as 99th percentile over last 5 minutes. \| In default Kubernetes installation with RTT between nodes <= Y, 99th percentile of (99th percentile over all prober pods) per cluster-day<sup>[1](#footnote1)</sup> <= X \| [Details](./dns_latency.md) \|

add networking slis for services #8142

Are you sure you want to change the base?

add networking slis for services #8142

Conversation

aojea commented Nov 7, 2024

k8s-ci-robot commented Nov 7, 2024

aojea commented Nov 7, 2024

ArvindParekh left a comment

Choose a reason for hiding this comment

ArvindParekh Nov 9, 2024

Choose a reason for hiding this comment

ArvindParekh Nov 9, 2024

Choose a reason for hiding this comment