-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subsys: pm: add check for device busy in policy #61726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check is already done here
Line 72 in 63bb74c
if (!device_is_ready(dev) || pm_device_is_busy(dev) || |
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks) | |||
uint8_t num_cpu_states; | |||
const struct pm_state_info *cpu_states; | |||
|
|||
#if CONFIG_PM_DEVICE | |||
if (pm_device_is_any_busy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, take a look at the state locking API. It's the device driver code that needs to prevent system sleep if it's busy doing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out Intel ISH platform requires all devices are not busy to enter into system low power. Add state locking in every device driver may not be a suitable solution. Is it possible to add a Kconfig option in pm that enable such property, for example config NEED_ALL_DEVICES_UNBUSY, if this config is y then runs this pm_device_is_any_busy() checking in policy.c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a bad case:
- a driver sets device busy, starts one transaction, then waits the completion semaphore with a long timeout or even FOREVER.
- Zephyr runs to idle thread and gets a long idle time, so PM policy returns a deep sleep state.
- PM suspends all other devices, then SoC goes into very low PM state, maybe suspend to disk.
- SoC detects there's device interrupt pending, so wakes up immediately.
- do all resume works.
- device interrupt gets served.
in such a case, low-power states should be blocked. Any driver should avoid it. It's like a common thing. It seems not good for all drivers to call pm state lock APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your device is waiting on something, and needs to prevent deep sleep, simply lock deep sleep states while waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that most drivers have similar working model:
- set register to trigger a transaction.
- wait complete semaphore which will be given in ISR.
- complete the transaction.
all those drivers should call PM state lock APIs to lock all low power states after step 1. as a common driver, I've afraid they should call PM state lock API multi times to lock PM_STATE_SUSPEND_TO_IDLE, PM_STATE_STANDBY, PM_STATE_SUSPEND_TO_RAM and PM_STATE_SUSPEND_TO_DISK, then call the API multi times to unlock them after transaction done.
this seems not good at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current solution doesn't solve the problems I mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine for me having this option wrapped around a Kconfig option if that is a common needed for devices in a platform. If it is particular to one device, I do stay with the state lock solution, but if that is common thing for the platform, I think this solution is reasonable, it less error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all ISH devices need it, except three on AON domain, HPET/RTC/IPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy
CONFIG_PM_POLICY_CUSTOM
.
agree, why not use a custom policy instead of change the default policy. PM_NEED_ALL_DEVICES_IDLE seems to be very specialized and soon enough we will be wondering what is the usecase behind it without too much context and details in the documentation. See https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/subsys/pm/power_mgmt/src/main.c#L222C29-L222C49 for an example of a custom policy.
This check is not enough for system level suspend. As can be seen from this code below, when one device is busy, the code runs continue statement, it means this device busy doesn't prevent system suspend which is wrong because system level suspend should happen when no device is busy, otherwise will surely create problem which has already been seen in Intel ISH platform.
|
Then just use the state locking APIs as suggested above. |
Agree, if that is the case, the right way to prevent the system to sleep is putting a constraint. |
subsys/pm/policy.c
Outdated
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks) | |||
uint8_t num_cpu_states; | |||
const struct pm_state_info *cpu_states; | |||
|
|||
#if defined(CONFIG_PM_DEVICE) && defined(CONFIG_PM_NEED_ALL_DEVICES_IDLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pollutes CONFIG_PM code with PM_DEVICE related stuff, so to me this is a -1. As explained, there are other more precise means to solve this issue, that is, with state locking (maybe refining its implementation to accommodate new requirements). There's also no need to enable CONFIG_PM_DEVICE to modify the behavior of system PM, this is why we have policy APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarull I have no knowledge of other SoCs/platforms, but Intel ISH's SoC core low-power mode and device/peripheral power do have relationship. When CPU goes into deep sleep, devices' power will be gated.
So ISH does need this change, appreciate your understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarull Hi Gerald, we do need this patch for production. could you approve it? or if you have any more concern, let me know. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expressed my concerns, and they have not been addressed. Let me know if this patch has further updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarull Hi Gerald, just get your point. Previously I thought CONFIG_PM_DEVICE is a sub-config of CONFIG_PM because CONFIG_PM_DEVICE has prefix CONFIG_PM, but it seems they're independent from subsystem/pm/Kconfig.
but this seems weird to me. Shouldn't Device PM be part of PM (Power Management)? @ceolin @nashif
Do you know there is any SoC that needs only CONFIG_PM_DEVICE but no CONFIG_PM? I wonder if there's any such need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarull I am not following what you are objecting to exactly, the code has changed since this:
This pollutes CONFIG_PM code with PM_DEVICE related stuff, so to me this is a -1
32d275e
to
54a51bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking since it's hidden behind #ifdef.
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks) | |||
uint8_t num_cpu_states; | |||
const struct pm_state_info *cpu_states; | |||
|
|||
#if CONFIG_PM_DEVICE | |||
if (pm_device_is_any_busy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy CONFIG_PM_POLICY_CUSTOM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is an acceptable constraint / behavior for the policy. Devices and SoC are not completely disconnect things in some cases, so having it protected by a build option is ok for me.
@@ -136,6 +137,12 @@ const struct pm_state_info *pm_policy_next_state(uint8_t cpu, int32_t ticks) | |||
uint8_t num_cpu_states; | |||
const struct pm_state_info *cpu_states; | |||
|
|||
#if CONFIG_PM_DEVICE | |||
if (pm_device_is_any_busy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ISH has more specific cases in term of pm_state selection, You can always define platform-specific policy
CONFIG_PM_POLICY_CUSTOM
.
agree, why not use a custom policy instead of change the default policy. PM_NEED_ALL_DEVICES_IDLE seems to be very specialized and soon enough we will be wondering what is the usecase behind it without too much context and details in the documentation. See https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/subsys/pm/power_mgmt/src/main.c#L222C29-L222C49 for an example of a custom policy.
Add check for device busy when CONFIG_PM_NEED_ALL_DEVICES_IDLE is set to y because one or more devices may still in busy and causes problem when system go into low power in Intel ISH platform. Signed-off-by: Leifu Zhao <leifu.zhao@intel.com>
I recommend using custom policy when you have very unique needs, like actions depending on specific cores, specialized constraints, ... This one didn't look a super unique case for me. The drawback for custom policy is that you have to account state and latency constraints in your policy for those APIs work correctly. |
Agree. POLICY_CUSTOM is not feasible solution at least right now. It is not simple as just copy and modify as imagined. Just copy pm_policy_next_state() and add check_all_busy() doesn't work since there are several variables defined as static outside the pm_policy_next_state() function in subsys/pm/policy.c, such as next_event_cyc, max_latency_cyc. If copy whole subsys/pm/policy.c file and add check_all_busy(), it also doesn't work. This is becasue subsys/pm/policy.c defines several API functions such as pm_policy_latency_request_update() etc, then there will be multiple definitions of same function. But if remove them in the new copied file, for example remove pm_policy_latency_request_update(), then pm_policy_latency_request_update() in subsys/pm/policy.c will call static function update_max_latency() which manipulates and updates local copy of max_latency_cyc. The max_latency_cyc variable used by pm_policy_next_state() in new copied file doesn't get updated which is wrong. So we have to change and add code to deal with and this brings risk to already working well ish PM. |
could #62550 be merged? I think it's good to solve the issue Leifu mentioned when implementing POLICY_CUSTOM. |
was trying to show possible solutions, I know it is not great, and I think we can do better than that introducing hooks rather than reimplementation of complete functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will have a followup proposal to improve use of custom policies without duplicating much code...
Hi, I'll re-enforce the +1 here. I think this is a reasonable constraint and it is protected under a build option, it is nothing crazy. The custom policy is supposed (or at least documented) to be used by the application, so doing it here would not allow applications to override its behavior. I was thinking about it ... much of the logic inside the default policy should be available to be used by custom policies ... the way it is now is, at least. ... painful |
So yet another abuse of powers mergings PRs, and then we are opening proposals like this #61358, I guess they're a joke. |
I disagree on the way the subsystem is maintained. For more context refer to zephyrproject-rtos#61726 or https://discord.com/channels/720317445772017664/997527108844798012/ 1153288380231208960 Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
I disagree on the way the subsystem is maintained. For more context refer to #61726 or https://discord.com/channels/720317445772017664/997527108844798012/ 1153288380231208960 Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
I disagree on the way the subsystem is maintained. For more context refer to zephyrproject-rtos/zephyr#61726 or https://discord.com/channels/720317445772017664/997527108844798012/ 1153288380231208960 (cherry picked from commit 4ba97c2) Original-Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com> GitOrigin-RevId: 4ba97c2 Change-Id: I2cf897da2b65b4068a111ec0cd3016ad2d351f5b Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/4875292 Reviewed-by: Al Semjonovs <asemjonovs@google.com> Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com> Commit-Queue: Al Semjonovs <asemjonovs@google.com> Tested-by: Al Semjonovs <asemjonovs@google.com>
I disagree on the way the subsystem is maintained. For more context refer to zephyrproject-rtos#61726 or https://discord.com/channels/720317445772017664/997527108844798012/ 1153288380231208960 Signed-off-by: Gerard Marull-Paretas <gerard@teslabs.com>
Should add check for device busy here because one or more devices may still in busy, otherwise problem may happen due to system go into low power mode with some devices still in busy. Issue is observed in Intel ISH platform.