You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Move "Understand PSI Metrics" into a new reference doc
The new reference doc talks about how to generate CPU / memory / I/O pressures with test workloads, and how to interpret PSI metrics through both the Summary API and the Prometheus metrics.
This feature is enabled by default, by setting the `KubeletPSI`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in the
Pressure Stall Information (PSI) metrics are provided for three resources: CPU, memory, and I/O. They are categorized into two main types of pressure: `some` and `full`.
205
-
206
-
***`some`**: This value indicates that some tasks (one or more) are stalled on a resource. For example, if some tasks are waiting for I/O, this metric will increase. This can be an early indicator of resource contention.
207
-
***`full`**: This value indicates that *all* non-idle tasks are stalled on a resource simultaneously. This signifies a more severe resource shortage, where the entire system is unable to make progress.
208
-
209
-
Each pressure type provides four metrics: `avg10`, `avg60`, `avg300`, and `total`. The `avg` values represent the percentage of wall-clock time that tasks were stalled over 10-second, 60-second, and 3-minute moving averages. The `total` value is a cumulative counter in microseconds showing the total time tasks have been stalled.
210
-
211
-
#### Example Scenarios
212
-
213
-
You can use a simple Pod with a stress-testing tool to generate resource pressure and observe the PSI metrics. The following examples use the `agnhost` container image, which includes the `stress` tool.
214
-
215
-
The examples show how to query the kubelet's `/metrics/cadvisor` endpoint to observe the Prometheus metrics.
216
-
217
-
**Example 1: Generating CPU Pressure**
218
-
219
-
Create a Pod that generates CPU pressure using the `stress` utility. This workload will put a heavy load on one CPU core.
The output should show an increasing value, indicating that the container is spending time stalled waiting for CPU resources.
247
-
248
-
249
-
Clean up the Pod when you are finished:
250
-
```shell
251
-
kubectl delete pod cpu-pressure-pod
252
-
```
253
-
254
-
**Example 2: Generating Memory Pressure**
255
-
256
-
This example creates a Pod that continuously writes to files in the container's writable layer, causing the kernel's page cache to grow and forcing memory reclamation, which generates pressure.
Copy file name to clipboardExpand all lines: content/en/docs/reference/instrumentation/node-metrics.md
+1-107Lines changed: 1 addition & 107 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,113 +54,7 @@ See [Summary API](/docs/reference/config-api/kubelet-stats.v1alpha1/) for detail
54
54
This feature is enabled by default, by setting the `KubeletPSI`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in
Pressure Stall Information (PSI) metrics are provided for three resources: CPU, memory, and I/O. They are categorized into two main types of pressure: `some` and `full`.
60
-
61
-
***`some`**: This value indicates that some tasks (one or more) are stalled on a resource. For example, if some tasks are waiting for I/O, this metric will increase. This can be an early indicator of resource contention.
62
-
***`full`**: This value indicates that *all* non-idle tasks are stalled on a resource simultaneously. This signifies a more severe resource shortage, where the entire system is unable to make progress.
63
-
64
-
Each pressure type provides four metrics: `avg10`, `avg60`, `avg300`, and `total`. The `avg` values represent the percentage of wall-clock time that tasks were stalled over 10-second, 60-second, and 3-minute moving averages. The `total` value is a cumulative counter in microseconds showing the total time tasks have been stalled.
65
-
66
-
#### Example Scenarios
67
-
68
-
You can use a simple Pod with a stress-testing tool to generate resource pressure and observe the PSI metrics. The following examples use the `agnhost` container image, which includes the `stress` tool.
69
-
70
-
First, start by watching the summary stats for your node. In a separate terminal, run:
71
-
```shell
72
-
# Replace <node-name> with the name of a node in your cluster
73
-
kubectl get --raw "/api/v1/nodes/<node-name>/proxy/stats/summary"| jq '.pods[] | select(.podRef.name | contains("pressure-pod"))'
74
-
```
75
-
76
-
**Example 1: Generating CPU Pressure**
77
-
78
-
Create a Pod that generates CPU pressure using the `stress` utility. This workload will put a heavy load on one CPU core.
Apply it to your cluster: `kubectl apply -f cpu-pressure-pod.yaml`
98
-
99
-
After the Pod is running, you will see the `some` PSI metrics for CPU increase in the summary API output. The `avg10` value for `some` pressure should rise above zero, indicating that tasks are spending time stalled on the CPU.
100
-
101
-
Clean up the Pod when you are finished:
102
-
```shell
103
-
kubectl delete pod cpu-pressure-pod
104
-
```
105
-
106
-
**Example 2: Generating Memory Pressure**
107
-
108
-
This example creates a Pod that continuously writes to files in the container's writable layer, causing the kernel's page cache to grow and forcing memory reclamation, which generates pressure.
- "i=0; while true; do dd if=/dev/zero of=testfile.$i bs=1M count=50 &>/dev/null; i=$(((i+1)%5)); sleep 0.1; done"
124
-
resources:
125
-
limits:
126
-
memory: "200M"
127
-
requests:
128
-
memory: "200M"
129
-
```
130
-
131
-
Apply it to your cluster. In the summary output, you will observe an increase in the `full` PSI metrics for memory, indicating that the system is under significant memory pressure.
132
-
133
-
Clean up the Pod when you are finished:
134
-
```shell
135
-
kubectl delete pod memory-pressure-pod
136
-
```
137
-
138
-
**Example 3: Generating I/O Pressure**
139
-
140
-
This Pod generates I/O pressure by repeatedly writing a file to disk and using `sync` to flush the data from memory, which creates I/O stalls.
0 commit comments