Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaphandre doesn't work on AMD Zen #55

Closed
barnumbirr opened this issue Jan 14, 2021 · 13 comments
Closed

scaphandre doesn't work on AMD Zen #55

barnumbirr opened this issue Jan 14, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@barnumbirr
Copy link

barnumbirr commented Jan 14, 2021

Bug description

Trrying to get scaphandre running on AMD Zen (Ryzen 5 Pro 2500U) results in thread panic.

To Reproduce

  • Install msr-tools
  • run scaphandre stdout -t 15

Output:

root@host:~# ./scaphandre stdout -t 15
thread 'main' panicked at 'Couldn't find intel_rapl modules.', src/sensors/powercap_rapl.rs:72:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Expected behavior

scaphandre should work as AMD RAPL support was added to kernel 5.8 and perf seems to return sensible data:

root@host:~# perf stat -a --per-socket -I 1000 -e power/energy-pkg/
#           time socket cpus             counts   unit events
     1.001342541 S0        1               2.06 Joules power/energy-pkg/
     2.002912211 S0        1               2.40 Joules power/energy-pkg/
     3.004453609 S0        1               1.87 Joules power/energy-pkg/
     3.837946645 S0        1               2.04 Joules power/energy-pkg/

Environment

  • Debian Buster
  • 5.9.15 (buster-backports)

Thank you again for creating scaphandre and please do let me know if I can help test this further.
Cheers.

@barnumbirr barnumbirr added the bug Something isn't working label Jan 14, 2021
@bpetit
Copy link
Contributor

bpetit commented Jan 15, 2021

HI ! Thanks for reporting.

It seems we have to check for different modules if the cpu is from AMD.
I note here a post about some modules that might be the ones we have to check for: https://www.phoronix.com/scan.php?page=news_item&px=AMD-Energy-Driver-Working-Well

Investigations to come.

@kamaradclimber
Copy link

Same error with "AMD Ryzen 5 2600X Six-Core Processor" (perf also reports data).

Here is the full output with the correct option:

sudo docker run -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc -e RUST_BACKTRACE=1 -ti hubblo/scaphandre stdout -t 1 
thread 'main' panicked at 'Couldn't find intel_rapl modules.', src/sensors/powercap_rapl.rs:72:13
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:505
   1: <scaphandre::sensors::powercap_rapl::PowercapRAPLSensor as scaphandre::sensors::Sensor>::generate_topology
             at ./src/sensors/powercap_rapl.rs:72
   2: <scaphandre::sensors::powercap_rapl::PowercapRAPLSensor as scaphandre::sensors::Sensor>::get_topology
             at ./src/sensors/powercap_rapl.rs:112
   3: scaphandre::exporters::stdout::StdoutExporter::new
             at ./src/exporters/stdout.rs:51
   4: scaphandre::run
             at ./src/lib.rs:60
   5: scaphandre::main
             at ./src/main.rs:90
   6: core::ops::function::FnOnce::call_once
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:227
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@bpetit
Copy link
Contributor

bpetit commented Jan 18, 2021

Hi,

Thanks for reporting ! I'll create a PR for that. If you could test it, once its there and before merging, it would be great !

@bpetit
Copy link
Contributor

bpetit commented Jan 29, 2021

It seems possible that a 5.11 kernel could be required: https://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen-PowerCap-RAPL-5.11

I don't see quite clearly the requirement right now for AMD. I'll create a PR that just avoids panicking if the intel modules are not found. This way you could tell me what happens next. (sorry I don't have an AMD box right now, so I place my hopes on you to make that topic go forward :D )

I saw that perf seems to work in 5.8 in your case @barnumbirr . Let's see how it goes with scaphandre with that PR.

@bpetit
Copy link
Contributor

bpetit commented Jan 29, 2021

Could you build and try with this version: #65 ?

🙏🏽

Let's see what happens and start from there :)

@kamaradclimber
Copy link

kamaradclimber commented Jan 29, 2021

Thanks a lot for investigating this :)

RUST_BACKTRACE=1 ./target/debug/scaphandre stdout

scaphandre::sensors::powercap_rapl: Couldn't find intel_rapl modules.
thread 'main' panicked at 'Trick: if you are running on a vm, do not forget to use --vm parameter invoking scaphandre at the command line', src/sensors/mod.rs:238:18
stack backtrace:
   0: rust_begin_unwind
             at /build/rust/src/rustc-1.49.0-src/library/std/src/panicking.rs:495:5
   1: core::panicking::panic_fmt
             at /build/rust/src/rustc-1.49.0-src/library/core/src/panicking.rs:92:14
   2: core::option::expect_failed
             at /build/rust/src/rustc-1.49.0-src/library/core/src/option.rs:1260:5
   3: core::option::Option<T>::expect
             at /build/rust/src/rustc-1.49.0-src/library/core/src/option.rs:349:21
   4: scaphandre::sensors::Topology::add_cpu_cores
             at ./src/sensors/mod.rs:234:26
   5: <scaphandre::sensors::powercap_rapl::PowercapRAPLSensor as scaphandre::sensors::Sensor>::generate_topology
             at ./src/sensors/powercap_rapl.rs:106:9
   6: <scaphandre::sensors::powercap_rapl::PowercapRAPLSensor as scaphandre::sensors::Sensor>::get_topology
             at ./src/sensors/powercap_rapl.rs:112:24
   7: scaphandre::exporters::stdout::StdoutExporter::new
             at ./src/exporters/stdout.rs:51:30
   8: scaphandre::run
             at ./src/lib.rs:60:28
   9: scaphandre::main
             at ./src/main.rs:91:5
  10: core::ops::function::FnOnce::call_once
             at /build/rust/src/rustc-1.49.0-src/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@bpetit
Copy link
Contributor

bpetit commented Feb 2, 2021

I'm looking for a hosting or cloud provider to get an AMD zen testing machine, but I'm having a hard time finding an offer at a reasonable price. Do you know some ? Or do you have such a machine available for test on the Internet by any chance ?

@bpetit
Copy link
Contributor

bpetit commented Feb 3, 2021

I may have found one thanks to @airclovis ! 👏🏽

https://www.skysilk.com/amd-epyc-servers/

EDIT; after all it seems they provide only VPSs, no bare metal

@barnumbirr
Copy link
Author

barnumbirr commented Feb 3, 2021

You'd probably run into the same issue with Hetzner CPX instances, don't think they're bare metal. 😒
My machines are unfortunately not available on the Internet.

@kamaradclimber
Copy link

kamaradclimber commented Feb 3, 2021

If you are interested in testing on "AMD Ryzen 5 2600X Six-Core Processor", contact me at grego_scaphandre@familleseux.net and I'll provide you an access to a bare-metal machine (not a professional server, it's just a regular PC acting like a server).

@bpetit
Copy link
Contributor

bpetit commented Feb 3, 2021

I may give it a try and I'll send you an email, thanks !

However I'm looking for a cloud/hosting offer too in order to integrate tests on amd in PR checks.

@bpetit
Copy link
Contributor

bpetit commented Feb 8, 2021

I'll start some tests thanks to @kamaradclimber generous offer (thank you ! 🥳 )
After that, I may have a look to those offers from hertzner that are the most affordable ones I found for dedicated machines with AMD Zen cpus so far.

bpetit added a commit that referenced this issue Feb 18, 2021
…-on-amd-zen

fix: allowing scaph to run even if intel_rapl modules r not found
@bpetit
Copy link
Contributor

bpetit commented Mar 7, 2021

Thanks to @kamaradclimber (🙏🏽 🥳 ) I've been able to validate that with a kernel >5.11, the power consumption data is accessible through the powercap module for AMD, as it is on Intel CPUs (and that scaphandre works correctly in that context). So the simplest way to make scaphandre work on AMD would be to have a kernel >=5.11.

That being said, to support more CPUs/kernel combos, I think being able to collect the data directly from the MSR like perf does would be great (I can think of some other usecases where it is necessary). So I think I'll open a new FR to work on an MSR based sensor.

So for now, ⚠️ please upgrade to kernel 5.11 or later to use scaphandre on AMD ⚠️

I'm writing that in the documentation right now.

We may be able to propose something else for older kernels later (through MSR)... stay tuned (and don't hesitate to jump on the topic if you want to contribute 😀 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Previous releases
Development

No branches or pull requests

3 participants