-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Hi, I'm benchmarking Qwen2-0.5B inference using GenAI on an LNL platform and observing up to 15% token generation latency variation between different runs, likely due to NPU frequency scaling (DVFS). For reproducible performance results, I need a way to lock the NPU to a fixed frequency.
Standard DVFS controls seem to act only as hints. I found a kernel patch for getting DPU frequency (link), but not for setting the NPU frequency.
Is there a recommended way to lock the frequency? As a potential workaround, would setting the pll_min_ratio and pll_max_ratio kernel parameters to the same be a viable approach?
Things I have done to minimize the variation:
sudo powerprofilesctl set performance
sudo sh -c "echo 1 > /sys/kernel/debug/accel/0000:00:0b.0/dvfs_mode"