-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fetching PSI stats for cgroupv2 containers #3358
Conversation
const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833 | ||
full avg10=1.00 avg60=1.01 avg300=1.00 total=157622356` | ||
|
||
func TestStatCPUPsi(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have integration tests too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added one for runc events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize PSI
Can we please have |
31e4197
to
5699010
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few typos in helpers.bash, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ping @opencontainers/runc-maintainers |
ping @opencontainers/runc-maintainers again |
@dqminh can you please rebase? We have changed something in the tests and they need to be re-run. |
a07e8f8
to
521a41d
Compare
@kolyshkin thanks, rebase is done. I had to add |
ping @kolyshkin, this would be great to get in, especially for use downstream in cAdvisor / k8s |
libcontainer/cgroups/fs2/fs2.go
Outdated
@@ -114,6 +114,16 @@ func (m *manager) GetStats() (*cgroups.Stats, error) { | |||
if err := statCpu(m.dirPath, st); err != nil && !os.IsNotExist(err) { | |||
errs = append(errs, err) | |||
} | |||
// psi ( since kernel 4.20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space after (
libcontainer/cgroups/fs2/psi.go
Outdated
} | ||
|
||
func parsePSIData(psi []string) (data cgroups.PSIData, err error) { | ||
for i := 0; i < len(psi); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be more readable to use
for _, f := range psi {
(and use f
instead of psi[i]
below).
libcontainer/cgroups/fs2/psi_test.go
Outdated
"github.com/opencontainers/runc/libcontainer/cgroups" | ||
) | ||
|
||
const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize PSI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, otherwise looks good.
1f36e91
to
c495b2b
Compare
We read output from the following files if they exists: - cpu.pressure - memory.pressure - io.pressure Each are in format: ``` some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 ``` Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@kolyshkin updated the nits, also added a fix to check for ENOTSUP when kernel compiled with PSI_DEFAULT_DISABLED, so reading *.pressure file return ENOTSUP as seen in https://cirrus-ci.com/task/4929935776153600 |
ping @kolyshkin for another round of reviews :) |
// open *.pressure file returns | ||
// - ErrNotExist when kernel < 4.20 or CONFIG_PSI is disabled | ||
// - ENOTSUP when we requires psi=1 in kernel command line to enable PSI support | ||
if err := statPSI(m.dirPath, "cpu.pressure", &st.CpuStats.PSI); err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, syscall.ENOTSUP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to use unix.ENOTSUP
from golang.org/x/sys/unix
here (as syscall
is considered an obsoleted package).
psi) | ||
# If PSI is not compiled in the kernel, the file will not exist. | ||
# If PSI is compiled, but not enabled, read will fail with ENOTSUPP. | ||
if [[ ! $(cat /sys/fs/cgroup/cpu.pressure) ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use [[
here.
if ! cat /sys/fs/cgroup/cpu.pressure &>/dev/null; then
should be sufficient.
gentle ping to @dqminh regarding the updates mentioned. would be great to get this in so we can consume PSI metrics downstream in cAdvisor / k8s |
@kolyshkin Maybe we can address those issues after merging? |
Hey @dqminh this is a great PR. There has been no activity for some time so I am wondering if you are still interested? Otherwise I would be willing to take over and make the requested changes. |
@kolyshkin Given that this PR has stalled, how would you like to proceed? I can make the requested changes. Would be great if we could make PSI metrics available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions; I was coding along while writing, so I'll push a PR against @dqminh's branch for consideration (also happy to carry in a separate PR)
Some PSIData `json:"some,omitempty"` | ||
Full PSIData `json:"full,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would make sense to make these pointers, as it looks like that's how they're consumed in various places, plus it would allow parsePSIData
to return nil, err
(which may be slightly cleaner in in tent.
for _, f := range psi { | ||
kv := strings.SplitN(f, "=", 2) | ||
if len(kv) != 2 { | ||
return data, fmt.Errorf("invalid psi data: %q", f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these should probably use PSI
(capitalised) as well
type PSIData struct { | ||
Avg10 float64 `json:"avg10"` | ||
Avg60 float64 `json:"avg60"` | ||
Avg300 float64 `json:"avg300"` | ||
Total uint64 `json:"total"` | ||
} | ||
|
||
type PSIStats struct { | ||
Some PSIData `json:"some,omitempty"` | ||
Full PSIData `json:"full,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we could just make these an alias for cgroups.PSIData
and cgroups.PSIStat
. That way we wouldn't need the conversion function.
case "avg10": | ||
v, err := strconv.ParseFloat(kv[1], 64) | ||
if err != nil { | ||
return data, fmt.Errorf("invalid psi value: %q", f) | ||
} | ||
data.Avg10 = v | ||
case "avg60": | ||
v, err := strconv.ParseFloat(kv[1], 64) | ||
if err != nil { | ||
return data, fmt.Errorf("invalid psi value: %q", f) | ||
} | ||
data.Avg60 = v | ||
case "avg300": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are quire repetitive; perhaps we could combine them 🤔
type CpuStats struct { | ||
CpuUsage CpuUsage `json:"cpu_usage,omitempty"` | ||
ThrottlingData ThrottlingData `json:"throttling_data,omitempty"` | ||
PSI PSIStats `json:"psi,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this to be a pointer for the omitempty
to actually do something; https://go.dev/play/p/R68xTF5hreC
@thaJeztah i'm really sorry about the delay. We have been running with the patch for so long (!) that i forgot it hasn't been merged. I left a few comments in the PR on my fork. Generally i think we can try to make the refactoring of existing practices separately from the current change. |
@dqminh I read the comments and tried to fix everything and opened dqminh#37 . I have here issue I can't compile it and test wait for fetching containers since 15 minutes.
Tests run and fail for some reason, that might be problem of my system, not the code, but let's see what the test suite says. I hope this helps to get it merged. :) |
Stray comment: As of linux 6.1 there's an interface called |
FYI: I created #3679 as follow up to get this merged |
This supports fetching PSI stats for cgroupv2 containers. We read the PSI metrics if they are available from:
See more about PSI at https://www.kernel.org/doc/html/latest/accounting/psi.html