Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: scheduler tries to make polling threads active #8049

Closed
mike-scott opened this issue May 31, 2018 · 0 comments
Closed

kernel: scheduler tries to make polling threads active #8049

mike-scott opened this issue May 31, 2018 · 0 comments
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Milestone

Comments

@mike-scott
Copy link
Contributor

Since commit 1856e22#diff-23bba5938c8ac686f219f01f58478b9a "kernel/sched: Don't preempt cooperative threads" the update_cache() function in kernel/sched.c has been placing threads which are polling back into the ready queue if they are high enough priority.

This occurs when CONFIG_POLL is enabled (could be auto-selected when CONFIG_NETWORKING is enabled) and a high priority workq such as net_tc's TX / RX workq becomes the only available high priority thread to be requeued.

In Zephyr, this manifests as a random deadlock in the kernel where the work_q_main function will return from k_queue_get() with a NULL work and then immediately call k_queue_get() again only to return and be re-called over and over.

This was never a problem before the above commit, because update_cache would preempt any thread even high priority ones.

A long discussion of this behavior is outlined in #8005

@mike-scott mike-scott self-assigned this May 31, 2018
@mike-scott mike-scott added this to the v1.12.0 milestone May 31, 2018
@mike-scott mike-scott added the bug The issue is a bug, or the PR is fixing a bug label May 31, 2018
@mike-scott mike-scott added the priority: medium Medium impact/importance bug label May 31, 2018
mike-scott added a commit to mike-scott/zephyr that referenced this issue May 31, 2018
commit 1856e22 ("kernel/sched: Don't preempt cooperative threads")
changed the behavior of update_cache() in kernel/sched.c so that
high priority threads couldn't be preempted (they are after all
high priority threads).

This can cause a random deadlock situation in work_q_main() where the
next high priority thread to be added to the readyq was a thread that
isn't ready to run (i.e. blocked on a signal / timer).  When the thread
was unpended, it would return from k_queue_get() with a NULL work.
The work_q_main() function immediate calls k_queue_get() again and
the scheduler immediately unpends the same thread.  Rinse and repeat.

In update_cache() we should check that a thread is ready before blocking
attempts to preempt it.  This should avoid the spin in work_q_main()
since the thread will not be unpended until it unblocks correctly
or is cancelled.

Fixes: zephyrproject-rtos#8049

Signed-off-by: Michael Scott <michael@opensourcefoundries.com>
andyross pushed a commit to andyross/zephyr that referenced this issue May 31, 2018
The should_preempt() code was catching some of the "unrunnable" cases
but not all of them, opening the possibility of failing to preempt a
just-pended thread and thus waking it up synchronously.  There are
reports of this causing spin loops over k_poll() in the network stack
work queues (see zephyrproject-rtos#8049).

Note that the previous _is_dummy() call is folded into (the somewhat
verbosely named) _is_thread_prevented_from_running(), and that the
order of tests has been changed/optimized to hopefully catch common
cases earlier.

Suggested-by: Michael Scott <michael@opensourcefoundries.com>
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
@mike-scott mike-scott changed the title kernel: scheduler tries to put polling threads into the readyq kernel: scheduler tries to make polling threads active May 31, 2018
nashif pushed a commit that referenced this issue May 31, 2018
The should_preempt() code was catching some of the "unrunnable" cases
but not all of them, opening the possibility of failing to preempt a
just-pended thread and thus waking it up synchronously.  There are
reports of this causing spin loops over k_poll() in the network stack
work queues (see #8049).

Note that the previous _is_dummy() call is folded into (the somewhat
verbosely named) _is_thread_prevented_from_running(), and that the
order of tests has been changed/optimized to hopefully catch common
cases earlier.

Suggested-by: Michael Scott <michael@opensourcefoundries.com>
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

1 participant