Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel RAPL perms changed in newer kernels, system logs spammed #9324

Closed
kmoad opened this issue Jun 2, 2021 · 1 comment · Fixed by #11035
Closed

Intel RAPL perms changed in newer kernels, system logs spammed #9324

kmoad opened this issue Jun 2, 2021 · 1 comment · Fixed by #11035
Labels
area/prometheus bug unexpected problem or unintended behavior

Comments

@kmoad
Copy link

kmoad commented Jun 2, 2021

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.intel_powerstat]]
    cpu_metrics = ["cpu_frequency", "cpu_temperature"]
[[inputs.net]]
[[inputs.nvidia_smi]]

System info:

Telegraf 1.18.3 (git: HEAD 6a94f65)
Fedora 34
Kernel 5.12.8

Steps to reproduce:

  1. Enable [[input.intel_powerstat]] with `cpu_metrics = ["cpu_temperature"]
  2. Restart telegraf
  3. Watch journalctl -u telegraf -f

Expected behavior:

Hate to say it, but silent failure would be better. See additional info section.

Actual behavior:

Everytime metrics are collected, intel_powerstats reports an error and fails to read cpu_temperature

[inputs.intel_powerstat] error fetching rapl data for socket 0, err: error opening socket energy_uj file on path /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj, err: open /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj: permission denied

Additional info:

File is read-only by root

$ ls -l /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj
-r--------. 1 root root 4096 Jun  1 21:03 /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj

Similar issue for prometheus here

Related to this kernel change and Intel CVE-2020-8695

For security reasons, energy_uj is now readable only by root, and is likely to remain so. Telegraf cannot read this file and it would be nice if it failed without putting an error message in the logs every ten seconds.

In the long term, as more kernels update, a workaround to reading energy_uj will be needed.

@kmoad kmoad added the bug unexpected problem or unintended behavior label Jun 2, 2021
@kmoad kmoad changed the title rapl broken in newer kernels, system logs polluted rapl broken in newer kernels, system logs spammed Jun 2, 2021
@kmoad
Copy link
Author

kmoad commented Jun 2, 2021

@MaciejMis

@kmoad kmoad changed the title rapl broken in newer kernels, system logs spammed Intel RAPL perms changed in newer kernels, system logs spammed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant