Skip to content

Commit

Permalink
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
Browse files Browse the repository at this point in the history
There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
"backwards" from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Reported-and-Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/20131101104146.03d1e043@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Rik van Riel authored and Ingo Molnar committed Nov 11, 2013
1 parent 37dc6b5 commit 7053ea1
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions kernel/stop_machine.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <linux/kallsyms.h>
#include <linux/smpboot.h>
#include <linux/atomic.h>
#include <linux/lglock.h>

/*
* Structure to determine completion condition and record errors. May
Expand All @@ -43,6 +44,14 @@ static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
static bool stop_machine_initialized = false;

/*
* Avoids a race between stop_two_cpus and global stop_cpus, where
* the stoppers could get queued up in reverse order, leading to
* system deadlock. Using an lglock means stop_two_cpus remains
* relatively cheap.
*/
DEFINE_STATIC_LGLOCK(stop_cpus_lock);

static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
{
memset(done, 0, sizeof(*done));
Expand Down Expand Up @@ -276,6 +285,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
return -ENOENT;
}

lg_local_lock(&stop_cpus_lock);
/*
* Queuing needs to be done by the lowest numbered CPU, to ensure
* that works are always queued in the same order on every CPU.
Expand All @@ -284,6 +294,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
smp_call_function_single(min(cpu1, cpu2),
&irq_cpu_stop_queue_work,
&call_args, 0);
lg_local_unlock(&stop_cpus_lock);
preempt_enable();

wait_for_completion(&done.completion);
Expand Down Expand Up @@ -335,10 +346,10 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask,
* preempted by a stopper which might wait for other stoppers
* to enter @fn which can lead to deadlock.
*/
preempt_disable();
lg_global_lock(&stop_cpus_lock);
for_each_cpu(cpu, cpumask)
cpu_stop_queue_work(cpu, &per_cpu(stop_cpus_work, cpu));
preempt_enable();
lg_global_unlock(&stop_cpus_lock);
}

static int __stop_cpus(const struct cpumask *cpumask,
Expand Down

0 comments on commit 7053ea1

Please sign in to comment.