Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fetching PSI stats for cgroupv2 containers #3358

Closed
wants to merge 2 commits into from

Conversation

dqminh
Copy link
Contributor

@dqminh dqminh commented Jan 28, 2022

This supports fetching PSI stats for cgroupv2 containers. We read the PSI metrics if they are available from:

  • cpu.pressure
  • memory.pressure
  • io.pressure

See more about PSI at https://www.kernel.org/doc/html/latest/accounting/psi.html

const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833
full avg10=1.00 avg60=1.01 avg300=1.00 total=157622356`

func TestStatCPUPsi(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have integration tests too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one for runc events

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalize PSI

@kolyshkin
Copy link
Contributor

Can we please have s/Psi/PSI/ since this is an acronym? I know we use things like Json before, but in the new code we can do better.

@dqminh dqminh force-pushed the psi branch 2 times, most recently from 31e4197 to 5699010 Compare February 3, 2022 17:53
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin requested a review from AkihiroSuda February 8, 2022 03:49
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few typos in helpers.bash, otherwise LGTM

kolyshkin
kolyshkin previously approved these changes Feb 16, 2022
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dqminh
Copy link
Contributor Author

dqminh commented Feb 24, 2022

ping @opencontainers/runc-maintainers

@dqminh
Copy link
Contributor Author

dqminh commented Mar 22, 2022

ping @opencontainers/runc-maintainers again

@kolyshkin
Copy link
Contributor

@dqminh can you please rebase? We have changed something in the tests and they need to be re-run.

@dqminh
Copy link
Contributor Author

dqminh commented Mar 23, 2022

@kolyshkin thanks, rebase is done. I had to add # shellcheck disable=SC2030 for the integration test.

@bobbypage
Copy link

ping @kolyshkin, this would be great to get in, especially for use downstream in cAdvisor / k8s

@@ -114,6 +114,16 @@ func (m *manager) GetStats() (*cgroups.Stats, error) {
if err := statCpu(m.dirPath, st); err != nil && !os.IsNotExist(err) {
errs = append(errs, err)
}
// psi ( since kernel 4.20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space after (

}

func parsePSIData(psi []string) (data cgroups.PSIData, err error) {
for i := 0; i < len(psi); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be more readable to use

          for _, f := range psi {

(and use f instead of psi[i] below).

"github.com/opencontainers/runc/libcontainer/cgroups"
)

const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalize PSI

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, otherwise looks good.

@dqminh dqminh force-pushed the psi branch 4 times, most recently from 1f36e91 to c495b2b Compare May 13, 2022 15:56
dqminh added 2 commits May 13, 2022 16:58
We read output from the following files if they exists:
- cpu.pressure
- memory.pressure
- io.pressure

Each are in format:

```
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
```

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@dqminh
Copy link
Contributor Author

dqminh commented May 13, 2022

@kolyshkin updated the nits, also added a fix to check for ENOTSUP when kernel compiled with PSI_DEFAULT_DISABLED, so reading *.pressure file return ENOTSUP as seen in https://cirrus-ci.com/task/4929935776153600

@bobbypage
Copy link

bobbypage commented Jun 7, 2022

ping @kolyshkin for another round of reviews :)

// open *.pressure file returns
// - ErrNotExist when kernel < 4.20 or CONFIG_PSI is disabled
// - ENOTSUP when we requires psi=1 in kernel command line to enable PSI support
if err := statPSI(m.dirPath, "cpu.pressure", &st.CpuStats.PSI); err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, syscall.ENOTSUP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to use unix.ENOTSUP from golang.org/x/sys/unix here (as syscall is considered an obsoleted package).

psi)
# If PSI is not compiled in the kernel, the file will not exist.
# If PSI is compiled, but not enabled, read will fail with ENOTSUPP.
if [[ ! $(cat /sys/fs/cgroup/cpu.pressure) ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use [[ here.

if ! cat /sys/fs/cgroup/cpu.pressure &>/dev/null; then

should be sufficient.

@bobbypage
Copy link

gentle ping to @dqminh regarding the updates mentioned. would be great to get this in so we can consume PSI metrics downstream in cAdvisor / k8s

@AkihiroSuda
Copy link
Member

@kolyshkin Maybe we can address those issues after merging?

@Furisto
Copy link

Furisto commented Sep 8, 2022

Hey @dqminh this is a great PR. There has been no activity for some time so I am wondering if you are still interested? Otherwise I would be willing to take over and make the requested changes.

@Furisto
Copy link

Furisto commented Sep 19, 2022

@kolyshkin Given that this PR has stalled, how would you like to proceed? I can make the requested changes. Would be great if we could make PSI metrics available.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions; I was coding along while writing, so I'll push a PR against @dqminh's branch for consideration (also happy to carry in a separate PR)

Comment on lines +43 to +44
Some PSIData `json:"some,omitempty"`
Full PSIData `json:"full,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would make sense to make these pointers, as it looks like that's how they're consumed in various places, plus it would allow parsePSIData to return nil, err (which may be slightly cleaner in in tent.

for _, f := range psi {
kv := strings.SplitN(f, "=", 2)
if len(kv) != 2 {
return data, fmt.Errorf("invalid psi data: %q", f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these should probably use PSI (capitalised) as well

Comment on lines +24 to +34
type PSIData struct {
Avg10 float64 `json:"avg10"`
Avg60 float64 `json:"avg60"`
Avg300 float64 `json:"avg300"`
Total uint64 `json:"total"`
}

type PSIStats struct {
Some PSIData `json:"some,omitempty"`
Full PSIData `json:"full,omitempty"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we could just make these an alias for cgroups.PSIData and cgroups.PSIStat. That way we wouldn't need the conversion function.

Comment on lines +51 to +63
case "avg10":
v, err := strconv.ParseFloat(kv[1], 64)
if err != nil {
return data, fmt.Errorf("invalid psi value: %q", f)
}
data.Avg10 = v
case "avg60":
v, err := strconv.ParseFloat(kv[1], 64)
if err != nil {
return data, fmt.Errorf("invalid psi value: %q", f)
}
data.Avg60 = v
case "avg300":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are quire repetitive; perhaps we could combine them 🤔

type CpuStats struct {
CpuUsage CpuUsage `json:"cpu_usage,omitempty"`
ThrottlingData ThrottlingData `json:"throttling_data,omitempty"`
PSI PSIStats `json:"psi,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this to be a pointer for the omitempty to actually do something; https://go.dev/play/p/R68xTF5hreC

@dqminh
Copy link
Contributor Author

dqminh commented Oct 11, 2022

@thaJeztah i'm really sorry about the delay. We have been running with the patch for so long (!) that i forgot it hasn't been merged.

I left a few comments in the PR on my fork. Generally i think we can try to make the refactoring of existing practices separately from the current change.

@szuecs
Copy link
Contributor

szuecs commented Nov 4, 2022

@dqminh I read the comments and tried to fix everything and opened dqminh#37 . I have here issue I can't compile it and test wait for fetching containers since 15 minutes.
Dirty, because I did not commit and pushed changes when I tried to make.

% make
go build -trimpath "-buildmode=pie"  -tags "seccomp" -ldflags "-X main.gitCommit=v1.1.0-181-g6414629a-dirty -X main.version=1.1.0+dev " -o runc .
# pkg-config --cflags  -- libseccomp
Package libseccomp was not found in the pkg-config search path.
Perhaps you should add the directory containing `libseccomp.pc'
to the PKG_CONFIG_PATH environment variable
Package 'libseccomp', required by 'virtual:world', not found
pkg-config: exit status 1
make: *** [Makefile:33: runc] Error 2
zsh: exit 2     make

Tests run and fail for some reason, that might be problem of my system, not the code, but let's see what the test suite says.

I hope this helps to get it merged. :)

@h-vetinari
Copy link

Stray comment: As of linux 6.1 there's an interface called cgroup.pressure to switch PSI accounting on/off per cgroup. I suppose the code here will have to check if accounting is turned off.

@szuecs
Copy link
Contributor

szuecs commented Dec 6, 2022

FYI: I created #3679 as follow up to get this merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants