Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu_freq metrics: use scaling or cpuinfo? #122

Closed
fbegyn opened this issue Dec 6, 2018 · 15 comments
Closed

cpu_freq metrics: use scaling or cpuinfo? #122

fbegyn opened this issue Dec 6, 2018 · 15 comments

Comments

@fbegyn
Copy link

fbegyn commented Dec 6, 2018

Since PR #94 it has the default behaviour of reading scaling_cpu_freq in favor of cpuinfo_cur_freq. These 2 represent different metrics in my eyes. The scaling one reflects at which frequency the linux kernel thinks the cpu runs, while cpuinfo reflects the actual HW frequency of the CPU.
source: https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt

IMHO cpuinfo should still be the default value to be read here, or these 2 values should maybe represent 2 different metrics. I'd like to know the reasoning for picking the scaling over the cpuinfo statistic.

@mjtrangoni
Copy link
Contributor

@fbegyn I checked it out, and found that cpuinfo_cur_freq is only readable by root, which is not allowed here, as the node_exporter should be run as user.

See,

# ll /sys/devices/system/cpu/cpu0/cpufreq/*
-r--r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus
-r--------. 1 root root 4096 Dec  6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
-r--r--r--. 1 root root 4096 Dec  6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
-r--r--r--. 1 root root 4096 Dec  6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
-r--r--r--. 1 root root 4096 Dec  6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency
-r--r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/related_cpus
-r--r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
-r--r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
-r--r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
-rw-r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
-rw-r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-rw-r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-rw-r--r--. 1 root root 4096 Dec  6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
# grep . /sys/devices/system/cpu/cpu0/cpufreq/{cpuinfo_cur_freq,scaling_cur_freq}
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1200103
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1200103

@SuperQ
Copy link
Member

SuperQ commented Dec 6, 2018

Interesting, honestly, I didn't look that closely at the documentation. I assumed they were the same, just a rename between different kernel versions. We originally changed the reading preference to work around bugs in the RHEL kernel permissions for the proc file.

@mjtrangoni That's an old redhat kernel bug, on the upstream kernel and more recent patch versions of the RHEL kernel, the permission problem is fixed.

I guess we need to re-read the kernel docs more carefully.

/cc @rtreffer

@fbegyn
Copy link
Author

fbegyn commented Dec 6, 2018

Discovered it while comparing node-exporter metrics from a few months ago with the current ones because we couldn't make sense of it. We changed cpu goverenor to perormance but didn't see the change on the metrics. While the expected behaviour did show up in the data a from a few months ago, it didn't now.
Couldn't wrap my head around it, so dove into the kernel a bit and found this out.

I'm not fully comprehending the use for scaling_cur_freq, but definitely see a usecase for cpuinfo_cur_freq.

@SuperQ
Copy link
Member

SuperQ commented Dec 6, 2018

I'll have to review the docs, but after first glance, you're probably right. We shouldn't be using the scaling_ files. sigh

Totally unrelated, performance on Intel doesn't really do what you expect on modern cores. It still uses sleep states to throttle down the CPU. Basically it's a noop on sandy bridge and newer from my testing.

@mjtrangoni
Copy link
Contributor

mjtrangoni commented Dec 6, 2018

@SuperQ This is read-only upstream now. See,

$ nl -b a drivers/cpufreq/cpufreq.c|grep -B5 -A5 "cpuinfo_cur_freq, 0400"
   868                          return sprintf(buf, "%u\n", limit);
   869          }
   870          return sprintf(buf, "%u\n", policy->cpuinfo.max_freq);
   871  }
   872
   873  cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
   874  cpufreq_freq_attr_ro(cpuinfo_min_freq);
   875  cpufreq_freq_attr_ro(cpuinfo_max_freq);
   876  cpufreq_freq_attr_ro(cpuinfo_transition_latency);
   877  cpufreq_freq_attr_ro(scaling_available_governors);
   878  cpufreq_freq_attr_ro(scaling_driver);

And here Dave Jones says why,

Reading this file causes reads from hardware on some cpufreq drivers.
This can be a slow operation, so a user could degrade system performance
for everyone else by repeatedly cat'ing it.

@SuperQ
Copy link
Member

SuperQ commented Dec 6, 2018

@mjtrangoni I think a number of distributions patch that file.

Looking at a couple of my machines, it appears cpuinfo_cur_freq doesn't even exist on on them (>= 4.15).

It seems like my systems have the pcc_cpufreq0 module loaded.

@knweiss
Copy link

knweiss commented Dec 6, 2018

@SuperQ FWIW, quoting Documentation/admin-guide/pm/cpufreq.rst:

cpuinfo_cur_freq
        Current frequency of the CPUs belonging to this policy as obtained from
        the hardware (in KHz).

        This is expected to be the frequency the hardware actually runs at.
!       If that frequency cannot be determined, this attribute should not
!       be present.

@SuperQ
Copy link
Member

SuperQ commented Dec 7, 2018

@fbegyn Out of curiosity, what platform/driver are you using?

@SuperQ
Copy link
Member

SuperQ commented Dec 7, 2018

Since the scaling_cur_freq and cpuinfo_cur_freq contain different data, I think we should change the library to read both, and return float pointers if values are found.

@SuperQ
Copy link
Member

SuperQ commented Dec 7, 2018

I've created #123 as a possible solution.

@fbegyn
Copy link
Author

fbegyn commented Dec 7, 2018

@fbegyn Out of curiosity, what platform/driver are you using?

@SuperQ Westmere CPUs and Debian 9 (I believe on 4.18 kernel). Personal system is Dell XPS13 9360 on 4.19

@pgier
Copy link
Collaborator

pgier commented May 7, 2019

Is there anything more to do on this issue since #123 has been merged?

@fbegyn
Copy link
Author

fbegyn commented May 8, 2019

@pgier I don't think so. I lost track of this issue, but I'm closing it in relation to #123

@fbegyn fbegyn closed this as completed May 8, 2019
@jberryman
Copy link

@SuperQ

Totally unrelated, performance on Intel doesn't really do what you expect on modern cores. It still uses sleep states to throttle down the CPU. Basically it's a noop on sandy bridge and newer from my testing.

This is OT but do you have any more details here? On Ivy Bridge I've been trying to figure out how to get a stable cpu frequency for benchmarking (or even figure out how to know if I'm getting a stable clock speed).

I found setting the governor to performance stabilized performance somewhat in my benchmarks (and made things significantly faster), but the various metrics from /sys/... or /proc/cpuinfo are showing cpu frequency bouncing around wildly still.

There seems to be a lot of misinformation about all of this out there...

@SuperQ
Copy link
Member

SuperQ commented Nov 21, 2019

@jberryman Yes, it's very complicated, as there's a lot of trickery going on under the hood in the hardware to perform power and thermal envelope management between cores and other parts of the chip. I don't have any specific documentation links off the top of my head. But I would look around for documentation on the various kernel modules that implement the changes. Things like intel_cstate.

bobrik pushed a commit to bobrik/procfs that referenced this issue Jan 14, 2023
…_with_root

Get all processes with a custom root of `/proc` path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants