Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powercap: RAPL zone 'package' appears duplicated with identical index and name #2147

Closed
matej-g opened this issue Jul 7, 2021 · 1 comment · Fixed by #2146
Closed

powercap: RAPL zone 'package' appears duplicated with identical index and name #2147

matej-g opened this issue Jul 7, 2021 · 1 comment · Fixed by #2146

Comments

@matej-g
Copy link

matej-g commented Jul 7, 2021

I have been trying out the node exporter on my machine when I noticed the rapl collector was filling up the logs on each scrape with error:

evel=error ts=2021-07-07T12:08:10.427Z caller=stdlib.go:105 caller="error gathering metrics: [from Gatherer prometheus/procfs#2] collected metric \"node_rapl_package_joules_total\" { label:<name:\"index\" value:\"0\" > counter:<value:16574" msg=".328968 > } was collected before with the same name and label values"

After dumping out some more info about the zones in the node exporter's rapl collector, I noticed that the package is being reported twice with the same index:

level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=package value=16574328968
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=dram value=4388422054
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=package value=16574328968
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=core value=12677608414
level=error ts=2021-07-07T12:08:10.318Z caller=rapl_linux.go:78 collector=rapl index=0 name=uncore value=815362353
level=error ts=2021-07-07T12:08:10.319Z caller=rapl_linux.go:78 collector=rapl index=1 name=dram value=4388424861
level=error ts=2021-07-07T12:08:10.319Z caller=rapl_linux.go:78 collector=rapl index=0 name=psys value=36848287443

After checking out the powercap class, it looks like it is giving priority to the index embedded in the name of the RAPL zone. However, in my case, there are two zones with identical names / indices. After checking in /sys/class/powercap on my machine, I noticed I have two package-0 zones:

~ ❯ find -L /sys/class/powercap -maxdepth 2 -name 'name' -exec ls {} \; -exec cat {} \;  2>/dev/null
/sys/class/powercap/intel-rapl:1/name
psys
/sys/class/powercap/intel-rapl:0:2/name
dram
/sys/class/powercap/intel-rapl:0:0/name
core
/sys/class/powercap/intel-rapl-mmio:0:0/name
dram
/sys/class/powercap/intel-rapl:0/name
package-0
/sys/class/powercap/intel-rapl:0:1/name
uncore
/sys/class/powercap/intel-rapl-mmio:0/name
package-0

It seems then both are being parsed as package name with index 0, causing duplication and the aforementioned error on the side of node_exporter

Output of my uname -a for completeness sake:
Linux <machine-name> 4.18.0-305.3.1.el8_4.x86_64 prometheus/procfs#1 SMP Mon May 17 10:08:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

@binjip978
Copy link
Contributor

Hi, @matej-g, @discordianfish i don't think it's a bug in procfs. The names can be identical, but path probably cant. Here is an example from my machine:

zones: []sysfs.RaplZone{
	sysfs.RaplZone{
		Name:"package", Index:0, Path:"/sys/class/powercap/intel-rapl-mmio:0", MaxMicrojoules:0x3d08f5c252}, 
	sysfs.RaplZone{
		Name:"package", Index:0, Path:"/sys/class/powercap/intel-rapl:0", 
		MaxMicrojoules:0x3d08f5c252}, ...
}

In both cases the name is package-0, but path should be different, so to fix it we only need change node-exporter part to something like this:

descriptor := prometheus.NewDesc(
	prometheus.BuildFQName(namespace, "rapl", rz.Name+"_joules_total"),
	"Current RAPL "+rz.Name+" value in joules",
	[]string{"index", "path"}, nil,
)

ch <- prometheus.MustNewConstMetric(
	descriptor,
	prometheus.CounterValue,
	float64(newMicrojoules)/1000000.0,
	index,
	rz.Path,
)

And the end result will be like this:

# HELP node_rapl_core_joules_total Current RAPL core value in joules
# TYPE node_rapl_core_joules_total counter
node_rapl_core_joules_total{index="0",path="/sys/class/powercap/intel-rapl:0:0"} 4868.304675
# HELP node_rapl_dram_joules_total Current RAPL dram value in joules
# TYPE node_rapl_dram_joules_total counter
node_rapl_dram_joules_total{index="0",path="/sys/class/powercap/intel-rapl-mmio:0:0"} 1430.675902
node_rapl_dram_joules_total{index="1",path="/sys/class/powercap/intel-rapl:0:2"} 1430.677184
# HELP node_rapl_package_joules_total Current RAPL package value in joules
# TYPE node_rapl_package_joules_total counter
node_rapl_package_joules_total{index="0",path="/sys/class/powercap/intel-rapl-mmio:0"} 6747.327636
node_rapl_package_joules_total{index="0",path="/sys/class/powercap/intel-rapl:0"} 6747.346191
# HELP node_rapl_psys_joules_total Current RAPL psys value in joules
# TYPE node_rapl_psys_joules_total counter
node_rapl_psys_joules_total{index="0",path="/sys/class/powercap/intel-rapl:1"} 14156.235488
# HELP node_rapl_uncore_joules_total Current RAPL uncore value in joules
# TYPE node_rapl_uncore_joules_total counter
node_rapl_uncore_joules_total{index="0",path="/sys/class/powercap/intel-rapl:0:1"} 136.241106

Alternatively we my need to change metrics name in someway, if we want to avoid adding path label for some reason.

@discordianfish discordianfish transferred this issue from prometheus/procfs Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants