Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for exposing PSI metrics #3083

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Support for exposing PSI metrics #3083

wants to merge 3 commits into from

Conversation

dqminh
Copy link
Contributor

@dqminh dqminh commented Mar 23, 2022

Fix #3052

This depends on opencontainers/runc#3358, so it should not be merged as-is, but we can review the structure and how metrics are exposed.

@k8s-ci-robot
Copy link
Collaborator

Hi @dqminh. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
This adds 2 new set of metrics:
- `psi_total`: read total number of seconds a resource is under pressure
- `psi_avg`: read ratio of time a resource is under pressure over a
  sliding time window.

For more details about these definitions, see:
- https://www.kernel.org/doc/html/latest/accounting/psi.html
- https://facebookmicrosites.github.io/psi/docs/overview

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@dqminh dqminh force-pushed the psi branch 3 times, most recently from 4c5c036 to 57ac945 Compare May 20, 2022 11:08
This adds support for reading PSI metrics via prometheus. We exposes the
following for `psi_total`:

```
container_cpu_psi_total_seconds
container_memory_psi_total_seconds
container_io_psi_total_seconds
```

And for `psi_avg`:

```
container_cpu_psi_avg10_ratio
container_cpu_psi_avg60_ratio
container_cpu_psi_avg300_ratio

container_memory_psi_avg10_ratio
container_memory_psi_avg60_ratio
container_memory_psi_avg300_ratio

container_io_psi_avg10_ratio
container_io_psi_avg60_ratio
container_io_psi_avg300_ratio
```

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@szuecs
Copy link

szuecs commented Nov 4, 2022

@bobbypage does this need to wait longer, anything missing?
I ask because there is an internal request to support this feature. :)

@dqminh likely it needs a rebase, maybe you can do it :)

@bobbypage
Copy link
Collaborator

We are still waiting for opencontainers/runc#3358 to merged in runc...

@szuecs
Copy link

szuecs commented Nov 14, 2022

@bobbypage not sure should I take over the PR?
@dqminh thoughts?

john-liuqiming pushed a commit to cloud-native-observability/cadvisor that referenced this pull request Apr 4, 2023
Signed-off-by: liuqiming.lqm <liuqiming.lqm@alibaba-inc.com>
@SuperQ
Copy link

SuperQ commented Apr 17, 2023

Fyi, it's unnecessary and not in best practices to expose the pre-computed averages in Prometheus.

Prometheus can compute arbitrary averages from the Total data.

@nmlc
Copy link

nmlc commented Aug 23, 2023

Any news here? Seems like runc PSI pr got merged

if includedMetrics.Has(container.PSITotalMetrics) {
c.containerMetrics = append(c.containerMetrics, []containerMetric{
{
name: "container_cpu_psi_total_seconds",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "container_cpu_psi_total_seconds",
name: "container_pressure_cpu_seconds_total",

return getPSIValues(s, &s.Cpu.PSI, "total")
},
}, {
name: "container_memory_psi_total_seconds",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "container_memory_psi_total_seconds",
name: "container_pressure_memory_seconds_total",

return getPSIValues(s, &s.Memory.PSI, "total")
},
}, {
name: "container_io_psi_total_seconds",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "container_io_psi_total_seconds",
name: "container_pressure_io_seconds_total",

Comment on lines +1801 to +1827
if includedMetrics.Has(container.PSIAvgMetrics) {
makePSIAvgMetric := func(controller, window string) containerMetric {
return containerMetric{
name: fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window),
help: fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window),
valueType: prometheus.GaugeValue,
extraLabels: []string{"kind"},
getValues: func(s *info.ContainerStats) metricValues {
switch controller {
case "cpu":
return getPSIValues(s, &s.Cpu.PSI, "avg"+window)
case "memory":
return getPSIValues(s, &s.Memory.PSI, "avg"+window)
case "io":
return getPSIValues(s, &s.DiskIo.PSI, "avg"+window)
default:
return nil
}
},
}
}
for _, controller := range []string{"cpu", "memory", "io"} {
for _, window := range []string{"10", "60", "300"} {
c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window))
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are unnecessary in Prometheus as we can compute averages from the counters. Please remove them to avoid excess metric cardinality.

Suggested change
if includedMetrics.Has(container.PSIAvgMetrics) {
makePSIAvgMetric := func(controller, window string) containerMetric {
return containerMetric{
name: fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window),
help: fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window),
valueType: prometheus.GaugeValue,
extraLabels: []string{"kind"},
getValues: func(s *info.ContainerStats) metricValues {
switch controller {
case "cpu":
return getPSIValues(s, &s.Cpu.PSI, "avg"+window)
case "memory":
return getPSIValues(s, &s.Memory.PSI, "avg"+window)
case "io":
return getPSIValues(s, &s.DiskIo.PSI, "avg"+window)
default:
return nil
}
},
}
}
for _, controller := range []string{"cpu", "memory", "io"} {
for _, window := range []string{"10", "60", "300"} {
c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window))
}
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this may be resolved since the 10/60/300 averages are inherent to the exposed PSI data?

case "total":
// total is measured as microseconds
v = append(v, metricValue{value: float64(time.Duration(psi.Some.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"some"}})
v = append(v, metricValue{value: float64(time.Duration(psi.Full.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"full"}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for CPU, we don't need to expose "full". In practice, I've only found the "some" metrics to be useful. The "some" value is a superset of "full". IMO we should just include that to reduce the cardinality.

See the PSI docs:

CPU full is undefined at the system level, but has been reported since 5.13, so it is set to zero for backward compatibility.

Copy link

@ein-stein-chen ein-stein-chen Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to raise my opinion for exposing "CPU full" metrics on a container(/cgroup) level.

While "CPU full" is undefined at the system level, it reports values for other cgroups:

$ cat /sys/fs/cgroup/user.slice/cpu.pressure
some avg10=0.00 avg60=0.05 avg300=0.20 total=68139509
full avg10=0.00 avg60=0.00 avg300=0.00 total=40148380

Exposing both might be useful to differentiate between cgroups that try to execute more work than compute time available to them (indicated through "CPU some") and cgroups that are fully blocked/stalled (maybe because of other cgroups and/or process priority, …; indicated through "CPU full").

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for this info. Are there kernel docs that document the per-cgroup behavior?

IMO this should be a separate metric name, rather than a label. The reason is that since some is a inclusive of full, doing something like sum(rate(container_pressure_cpu_seconds_total[1m])) would be confusing.

I would suggest these two metric names:

  • container_pressure_cpu_seconds_total
  • container_pressure_cpu_full_seconds_total

@szuecs
Copy link

szuecs commented Oct 31, 2023

@bobbypage can you set ok to test and do a review?
Would be great to see this in cadvisor

@bobbypage
Copy link
Collaborator

Waiting for new runc release to include opencontainers/runc@1aa7ca8

@MathieuCesbron
Copy link

Can we have an update on this ? I would like to access the psi interface. Thanks guys.

@dims
Copy link
Collaborator

dims commented Feb 23, 2024

@MathieuCesbron did you see the previous comment (the one before yours?)

google.golang.org/grpc v1.33.2
k8s.io/klog/v2 v2.4.0
k8s.io/utils v0.0.0-20211116205334-6203023598ed
)

replace github.com/opencontainers/runc => github.com/dqminh/runc v0.0.0-20220513155811-6414629ada8a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and use the upstream runc as-is.

@zouyee
Copy link
Contributor

zouyee commented Apr 7, 2024

runc has released 1.2.0-rc.1

@zouyee
Copy link
Contributor

zouyee commented Apr 17, 2024

@bobbypage not sure should I take over the PR? @dqminh thoughts?

@szuecs take over ?

@SuperQ
Copy link

SuperQ commented Apr 17, 2024

I would be happy to take over this change, I would like to make sure it aligns well with best practices. As it is, it does not.

@szuecs
Copy link

szuecs commented Apr 17, 2024

There is only a release candidate so I would wait until there's a proper release.
I don't mind that someone else would do it.

@akgoel18
Copy link

akgoel18 commented Jun 6, 2024

@szuecs Any update on this ? I'm asking this is a internally heavily requested feature. :)

@szuecs
Copy link

szuecs commented Jun 6, 2024

@akgoel18 please check upstream project release cycle, thanks

@pacoxu
Copy link
Contributor

pacoxu commented Jul 18, 2024

opencontainers/runc#3900 was merged and will be released with runc 1.2.0

@pacoxu
Copy link
Contributor

pacoxu commented Oct 22, 2024

https://github.com/opencontainers/runc/releases/tag/v1.2.0 runc v1.2.0 is released today.

Do you have time to bump runc?

@dims
Copy link
Collaborator

dims commented Oct 22, 2024

@pacoxu when containerd 2.0 gets out runc used with it will be 1.2.0, so till we have only 1.6/1.7 containerd with k8s, we should stick to older runc(s) both as binary as well as vendoring

@haircommander
Copy link
Contributor

@dims why do cadvisor/k8s and containerd need to have an in sync runc version?

@dims
Copy link
Collaborator

dims commented Oct 22, 2024

@dims why do cadvisor/k8s and containerd need to have an in sync runc version?

bad things have happened before @haircommander example see opencontainers/runc#3849

@alexandremahdhaoui
Copy link

@dims @haircommander to address and prevent the issue with "out of sync" runc versions we could add a CI check.

NB: I wrote a tool (usage) in kubernetes-sigs/container-runtime to verify and ensure specified go modules stays in sync with some upstream modules. I'd be glad to open a PR here or in k/k to add a check.

@rexagod
Copy link

rexagod commented Nov 12, 2024

Since the backports are in now, I think we can go ahead with this (and incorporating @SuperQ's suggestions above)?

@SuperQ
Copy link

SuperQ commented Nov 12, 2024

I might go ahead and make a fork/alternative to this specific PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose Pressure Stall Information as metrics