kernel: scheduler tries to make polling threads active #8049

mike-scott · 2018-05-31T04:34:55Z

Since commit 1856e22#diff-23bba5938c8ac686f219f01f58478b9a "kernel/sched: Don't preempt cooperative threads" the update_cache() function in kernel/sched.c has been placing threads which are polling back into the ready queue if they are high enough priority.

This occurs when CONFIG_POLL is enabled (could be auto-selected when CONFIG_NETWORKING is enabled) and a high priority workq such as net_tc's TX / RX workq becomes the only available high priority thread to be requeued.

In Zephyr, this manifests as a random deadlock in the kernel where the work_q_main function will return from k_queue_get() with a NULL work and then immediately call k_queue_get() again only to return and be re-called over and over.

This was never a problem before the above commit, because update_cache would preempt any thread even high priority ones.

A long discussion of this behavior is outlined in #8005

The text was updated successfully, but these errors were encountered:

commit 1856e22 ("kernel/sched: Don't preempt cooperative threads") changed the behavior of update_cache() in kernel/sched.c so that high priority threads couldn't be preempted (they are after all high priority threads). This can cause a random deadlock situation in work_q_main() where the next high priority thread to be added to the readyq was a thread that isn't ready to run (i.e. blocked on a signal / timer). When the thread was unpended, it would return from k_queue_get() with a NULL work. The work_q_main() function immediate calls k_queue_get() again and the scheduler immediately unpends the same thread. Rinse and repeat. In update_cache() we should check that a thread is ready before blocking attempts to preempt it. This should avoid the spin in work_q_main() since the thread will not be unpended until it unblocks correctly or is cancelled. Fixes: zephyrproject-rtos#8049 Signed-off-by: Michael Scott <michael@opensourcefoundries.com>

The should_preempt() code was catching some of the "unrunnable" cases but not all of them, opening the possibility of failing to preempt a just-pended thread and thus waking it up synchronously. There are reports of this causing spin loops over k_poll() in the network stack work queues (see zephyrproject-rtos#8049). Note that the previous _is_dummy() call is folded into (the somewhat verbosely named) _is_thread_prevented_from_running(), and that the order of tests has been changed/optimized to hopefully catch common cases earlier. Suggested-by: Michael Scott <michael@opensourcefoundries.com> Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

The should_preempt() code was catching some of the "unrunnable" cases but not all of them, opening the possibility of failing to preempt a just-pended thread and thus waking it up synchronously. There are reports of this causing spin loops over k_poll() in the network stack work queues (see #8049). Note that the previous _is_dummy() call is folded into (the somewhat verbosely named) _is_thread_prevented_from_running(), and that the order of tests has been changed/optimized to hopefully catch common cases earlier. Suggested-by: Michael Scott <michael@opensourcefoundries.com> Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

mike-scott self-assigned this May 31, 2018

mike-scott added the area: Kernel label May 31, 2018

mike-scott added this to the v1.12.0 milestone May 31, 2018

mike-scott added the bug The issue is a bug, or the PR is fixing a bug label May 31, 2018

mike-scott mentioned this issue May 31, 2018

FRDM-K64F boot hang w/ mcuboot + lwm2m client #8005

Closed

mike-scott mentioned this issue May 31, 2018

net: sched: It's ok to preempt coop threads if they're polling #8051

Closed

mike-scott mentioned this issue May 31, 2018

Ethernet initialization is unreliable and gets stuck on frdm-k64f #8054

Closed

mike-scott added the priority: medium Medium impact/importance bug label May 31, 2018

mike-scott mentioned this issue May 31, 2018

samples/net/: Experiencing the delayed response from zephyr networking stack #8057

Closed

andyross mentioned this issue May 31, 2018

kernel/sched: Fix preemption logic #8077

Merged

mike-scott changed the title ~~kernel: scheduler tries to put polling threads into the readyq~~ kernel: scheduler tries to make polling threads active May 31, 2018

nashif closed this as completed in #8077 May 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: scheduler tries to make polling threads active #8049

kernel: scheduler tries to make polling threads active #8049

mike-scott commented May 31, 2018

kernel: scheduler tries to make polling threads active #8049

kernel: scheduler tries to make polling threads active #8049

Comments

mike-scott commented May 31, 2018