-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: scheduler tries to make polling threads active #8049
Labels
area: Kernel
bug
The issue is a bug, or the PR is fixing a bug
priority: medium
Medium impact/importance bug
Milestone
Comments
mike-scott
added a commit
to mike-scott/zephyr
that referenced
this issue
May 31, 2018
commit 1856e22 ("kernel/sched: Don't preempt cooperative threads") changed the behavior of update_cache() in kernel/sched.c so that high priority threads couldn't be preempted (they are after all high priority threads). This can cause a random deadlock situation in work_q_main() where the next high priority thread to be added to the readyq was a thread that isn't ready to run (i.e. blocked on a signal / timer). When the thread was unpended, it would return from k_queue_get() with a NULL work. The work_q_main() function immediate calls k_queue_get() again and the scheduler immediately unpends the same thread. Rinse and repeat. In update_cache() we should check that a thread is ready before blocking attempts to preempt it. This should avoid the spin in work_q_main() since the thread will not be unpended until it unblocks correctly or is cancelled. Fixes: zephyrproject-rtos#8049 Signed-off-by: Michael Scott <michael@opensourcefoundries.com>
andyross
pushed a commit
to andyross/zephyr
that referenced
this issue
May 31, 2018
The should_preempt() code was catching some of the "unrunnable" cases but not all of them, opening the possibility of failing to preempt a just-pended thread and thus waking it up synchronously. There are reports of this causing spin loops over k_poll() in the network stack work queues (see zephyrproject-rtos#8049). Note that the previous _is_dummy() call is folded into (the somewhat verbosely named) _is_thread_prevented_from_running(), and that the order of tests has been changed/optimized to hopefully catch common cases earlier. Suggested-by: Michael Scott <michael@opensourcefoundries.com> Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
mike-scott
changed the title
kernel: scheduler tries to put polling threads into the readyq
kernel: scheduler tries to make polling threads active
May 31, 2018
nashif
pushed a commit
that referenced
this issue
May 31, 2018
The should_preempt() code was catching some of the "unrunnable" cases but not all of them, opening the possibility of failing to preempt a just-pended thread and thus waking it up synchronously. There are reports of this causing spin loops over k_poll() in the network stack work queues (see #8049). Note that the previous _is_dummy() call is folded into (the somewhat verbosely named) _is_thread_prevented_from_running(), and that the order of tests has been changed/optimized to hopefully catch common cases earlier. Suggested-by: Michael Scott <michael@opensourcefoundries.com> Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area: Kernel
bug
The issue is a bug, or the PR is fixing a bug
priority: medium
Medium impact/importance bug
Since commit 1856e22#diff-23bba5938c8ac686f219f01f58478b9a "kernel/sched: Don't preempt cooperative threads" the update_cache() function in kernel/sched.c has been placing threads which are polling back into the ready queue if they are high enough priority.
This occurs when CONFIG_POLL is enabled (could be auto-selected when CONFIG_NETWORKING is enabled) and a high priority workq such as net_tc's TX / RX workq becomes the only available high priority thread to be requeued.
In Zephyr, this manifests as a random deadlock in the kernel where the work_q_main function will return from k_queue_get() with a NULL work and then immediately call k_queue_get() again only to return and be re-called over and over.
This was never a problem before the above commit, because update_cache would preempt any thread even high priority ones.
A long discussion of this behavior is outlined in #8005
The text was updated successfully, but these errors were encountered: