-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for exposing PSI metrics #3083
base: master
Are you sure you want to change the base?
Conversation
Hi @dqminh. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a870125
to
0a22793
Compare
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
This adds 2 new set of metrics: - `psi_total`: read total number of seconds a resource is under pressure - `psi_avg`: read ratio of time a resource is under pressure over a sliding time window. For more details about these definitions, see: - https://www.kernel.org/doc/html/latest/accounting/psi.html - https://facebookmicrosites.github.io/psi/docs/overview Signed-off-by: Daniel Dao <dqminh89@gmail.com>
4c5c036
to
57ac945
Compare
This adds support for reading PSI metrics via prometheus. We exposes the following for `psi_total`: ``` container_cpu_psi_total_seconds container_memory_psi_total_seconds container_io_psi_total_seconds ``` And for `psi_avg`: ``` container_cpu_psi_avg10_ratio container_cpu_psi_avg60_ratio container_cpu_psi_avg300_ratio container_memory_psi_avg10_ratio container_memory_psi_avg60_ratio container_memory_psi_avg300_ratio container_io_psi_avg10_ratio container_io_psi_avg60_ratio container_io_psi_avg300_ratio ``` Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@bobbypage does this need to wait longer, anything missing? @dqminh likely it needs a rebase, maybe you can do it :) |
We are still waiting for opencontainers/runc#3358 to merged in runc... |
@bobbypage not sure should I take over the PR? |
Signed-off-by: liuqiming.lqm <liuqiming.lqm@alibaba-inc.com>
Fyi, it's unnecessary and not in best practices to expose the pre-computed averages in Prometheus. Prometheus can compute arbitrary averages from the Total data. |
Any news here? Seems like runc PSI pr got merged |
if includedMetrics.Has(container.PSITotalMetrics) { | ||
c.containerMetrics = append(c.containerMetrics, []containerMetric{ | ||
{ | ||
name: "container_cpu_psi_total_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "container_cpu_psi_total_seconds", | |
name: "container_pressure_cpu_seconds_total", |
return getPSIValues(s, &s.Cpu.PSI, "total") | ||
}, | ||
}, { | ||
name: "container_memory_psi_total_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "container_memory_psi_total_seconds", | |
name: "container_pressure_memory_seconds_total", |
return getPSIValues(s, &s.Memory.PSI, "total") | ||
}, | ||
}, { | ||
name: "container_io_psi_total_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "container_io_psi_total_seconds", | |
name: "container_pressure_io_seconds_total", |
if includedMetrics.Has(container.PSIAvgMetrics) { | ||
makePSIAvgMetric := func(controller, window string) containerMetric { | ||
return containerMetric{ | ||
name: fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window), | ||
help: fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window), | ||
valueType: prometheus.GaugeValue, | ||
extraLabels: []string{"kind"}, | ||
getValues: func(s *info.ContainerStats) metricValues { | ||
switch controller { | ||
case "cpu": | ||
return getPSIValues(s, &s.Cpu.PSI, "avg"+window) | ||
case "memory": | ||
return getPSIValues(s, &s.Memory.PSI, "avg"+window) | ||
case "io": | ||
return getPSIValues(s, &s.DiskIo.PSI, "avg"+window) | ||
default: | ||
return nil | ||
} | ||
}, | ||
} | ||
} | ||
for _, controller := range []string{"cpu", "memory", "io"} { | ||
for _, window := range []string{"10", "60", "300"} { | ||
c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window)) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics are unnecessary in Prometheus as we can compute averages from the counters. Please remove them to avoid excess metric cardinality.
if includedMetrics.Has(container.PSIAvgMetrics) { | |
makePSIAvgMetric := func(controller, window string) containerMetric { | |
return containerMetric{ | |
name: fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window), | |
help: fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window), | |
valueType: prometheus.GaugeValue, | |
extraLabels: []string{"kind"}, | |
getValues: func(s *info.ContainerStats) metricValues { | |
switch controller { | |
case "cpu": | |
return getPSIValues(s, &s.Cpu.PSI, "avg"+window) | |
case "memory": | |
return getPSIValues(s, &s.Memory.PSI, "avg"+window) | |
case "io": | |
return getPSIValues(s, &s.DiskIo.PSI, "avg"+window) | |
default: | |
return nil | |
} | |
}, | |
} | |
} | |
for _, controller := range []string{"cpu", "memory", "io"} { | |
for _, window := range []string{"10", "60", "300"} { | |
c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window)) | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this may be resolved since the 10/60/300
averages are inherent to the exposed PSI data?
case "total": | ||
// total is measured as microseconds | ||
v = append(v, metricValue{value: float64(time.Duration(psi.Some.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"some"}}) | ||
v = append(v, metricValue{value: float64(time.Duration(psi.Full.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"full"}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for CPU, we don't need to expose "full". In practice, I've only found the "some" metrics to be useful. The "some" value is a superset of "full". IMO we should just include that to reduce the cardinality.
See the PSI docs:
CPU full is undefined at the system level, but has been reported since 5.13, so it is set to zero for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to raise my opinion for exposing "CPU full" metrics on a container(/cgroup) level.
While "CPU full" is undefined at the system level, it reports values for other cgroups:
$ cat /sys/fs/cgroup/user.slice/cpu.pressure
some avg10=0.00 avg60=0.05 avg300=0.20 total=68139509
full avg10=0.00 avg60=0.00 avg300=0.00 total=40148380
Exposing both might be useful to differentiate between cgroups that try to execute more work than compute time available to them (indicated through "CPU some") and cgroups that are fully blocked/stalled (maybe because of other cgroups and/or process priority, …; indicated through "CPU full").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for this info. Are there kernel docs that document the per-cgroup behavior?
IMO this should be a separate metric name, rather than a label. The reason is that since some
is a inclusive of full
, doing something like sum(rate(container_pressure_cpu_seconds_total[1m]))
would be confusing.
I would suggest these two metric names:
container_pressure_cpu_seconds_total
container_pressure_cpu_full_seconds_total
@bobbypage can you set ok to test and do a review? |
Waiting for new runc release to include opencontainers/runc@1aa7ca8 |
Can we have an update on this ? I would like to access the psi interface. Thanks guys. |
@MathieuCesbron did you see the previous comment (the one before yours?) |
google.golang.org/grpc v1.33.2 | ||
k8s.io/klog/v2 v2.4.0 | ||
k8s.io/utils v0.0.0-20211116205334-6203023598ed | ||
) | ||
|
||
replace github.com/opencontainers/runc => github.com/dqminh/runc v0.0.0-20220513155811-6414629ada8a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this and use the upstream runc
as-is.
@szuecs take over ? |
I would be happy to take over this change, I would like to make sure it aligns well with best practices. As it is, it does not. |
There is only a release candidate so I would wait until there's a proper release. |
@szuecs Any update on this ? I'm asking this is a internally heavily requested feature. :) |
@akgoel18 please check upstream project release cycle, thanks |
opencontainers/runc#3900 was merged and will be released with runc 1.2.0 |
https://github.com/opencontainers/runc/releases/tag/v1.2.0 runc v1.2.0 is released today. Do you have time to bump runc? |
@pacoxu when containerd 2.0 gets out runc used with it will be 1.2.0, so till we have only 1.6/1.7 containerd with k8s, we should stick to older runc(s) both as binary as well as vendoring |
@dims why do cadvisor/k8s and containerd need to have an in sync runc version? |
bad things have happened before @haircommander example see opencontainers/runc#3849 |
@dims @haircommander to address and prevent the issue with "out of sync" runc versions we could add a CI check. NB: I wrote a tool (usage) in kubernetes-sigs/container-runtime to verify and ensure specified go modules stays in sync with some upstream modules. I'd be glad to open a PR here or in k/k to add a check. |
I might go ahead and make a fork/alternative to this specific PR. |
Fix #3052
This depends on opencontainers/runc#3358, so it should not be merged as-is, but we can review the structure and how metrics are exposed.