Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm
Browse files Browse the repository at this point in the history
Pull ARM updates from Russell King:
 "The major items included in here are:

   - MCPM, multi-cluster power management, part of the infrastructure
     required for ARMs big.LITTLE support.

   - A rework of the ARM KVM code to allow re-use by ARM64.

   - Error handling cleanups of the IS_ERR_OR_NULL() madness and fixes
     of that stuff for arch/arm

   - Preparatory patches for Cortex-M3 support from Uwe Kleine-König.

  There is also a set of three patches in here from Hugh/Catalin to
  address freeing of inappropriate page tables on LPAE.  You already
  have these from akpm, but they were already part of my tree at the
  time he sent them, so unfortunately they'll end up with duplicate
  commits"

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (77 commits)
  ARM: EXYNOS: remove unnecessary use of IS_ERR_VALUE()
  ARM: IMX: remove unnecessary use of IS_ERR_VALUE()
  ARM: OMAP: use consistent error checking
  ARM: cleanup: OMAP hwmod error checking
  ARM: 7709/1: mcpm: Add explicit AFLAGS to support v6/v7 multiplatform kernels
  ARM: 7700/2: Make cpu_init() notrace
  ARM: 7702/1: Set the page table freeing ceiling to TASK_SIZE
  ARM: 7701/1: mm: Allow arch code to control the user page table ceiling
  ARM: 7703/1: Disable preemption in broadcast_tlb*_a15_erratum()
  ARM: mcpm: provide an interface to set the SMP ops at run time
  ARM: mcpm: generic SMP secondary bringup and hotplug support
  ARM: mcpm_head.S: vlock-based first man election
  ARM: mcpm: Add baremetal voting mutexes
  ARM: mcpm: introduce helpers for platform coherency exit/setup
  ARM: mcpm: introduce the CPU/cluster power API
  ARM: multi-cluster PM: secondary kernel entry code
  ARM: cacheflush: add synchronization helpers for mixed cache state accesses
  ARM: cpu hotplug: remove majority of cache flushing from platforms
  ARM: smp: flush L1 cache in cpu_die()
  ARM: tegra: remove tegra specific cpu_disable()
  ...
  • Loading branch information
torvalds committed May 3, 2013
2 parents 9992ba7 + 33b9f58 commit 8546dc1
Show file tree
Hide file tree
Showing 96 changed files with 3,087 additions and 1,210 deletions.
498 changes: 498 additions & 0 deletions Documentation/arm/cluster-pm-race-avoidance.txt

Large diffs are not rendered by default.

211 changes: 211 additions & 0 deletions Documentation/arm/vlocks.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
vlocks for Bare-Metal Mutual Exclusion
======================================

Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
mechanism, with reasonable but minimal requirements on the memory
system.

These are intended to be used to coordinate critical activity among CPUs
which are otherwise non-coherent, in situations where the hardware
provides no other mechanism to support this and ordinary spinlocks
cannot be used.


vlocks make use of the atomicity provided by the memory system for
writes to a single memory location. To arbitrate, every CPU "votes for
itself", by storing a unique number to a common memory location. The
final value seen in that memory location when all the votes have been
cast identifies the winner.

In order to make sure that the election produces an unambiguous result
in finite time, a CPU will only enter the election in the first place if
no winner has been chosen and the election does not appear to have
started yet.


Algorithm
---------

The easiest way to explain the vlocks algorithm is with some pseudo-code:


int currently_voting[NR_CPUS] = { 0, };
int last_vote = -1; /* no votes yet */

bool vlock_trylock(int this_cpu)
{
/* signal our desire to vote */
currently_voting[this_cpu] = 1;
if (last_vote != -1) {
/* someone already volunteered himself */
currently_voting[this_cpu] = 0;
return false; /* not ourself */
}

/* let's suggest ourself */
last_vote = this_cpu;
currently_voting[this_cpu] = 0;

/* then wait until everyone else is done voting */
for_each_cpu(i) {
while (currently_voting[i] != 0)
/* wait */;
}

/* result */
if (last_vote == this_cpu)
return true; /* we won */
return false;
}

bool vlock_unlock(void)
{
last_vote = -1;
}


The currently_voting[] array provides a way for the CPUs to determine
whether an election is in progress, and plays a role analogous to the
"entering" array in Lamport's bakery algorithm [1].

However, once the election has started, the underlying memory system
atomicity is used to pick the winner. This avoids the need for a static
priority rule to act as a tie-breaker, or any counters which could
overflow.

As long as the last_vote variable is globally visible to all CPUs, it
will contain only one value that won't change once every CPU has cleared
its currently_voting flag.


Features and limitations
------------------------

* vlocks are not intended to be fair. In the contended case, it is the
_last_ CPU which attempts to get the lock which will be most likely
to win.

vlocks are therefore best suited to situations where it is necessary
to pick a unique winner, but it does not matter which CPU actually
wins.

* Like other similar mechanisms, vlocks will not scale well to a large
number of CPUs.

vlocks can be cascaded in a voting hierarchy to permit better scaling
if necessary, as in the following hypothetical example for 4096 CPUs:

/* first level: local election */
my_town = towns[(this_cpu >> 4) & 0xf];
I_won = vlock_trylock(my_town, this_cpu & 0xf);
if (I_won) {
/* we won the town election, let's go for the state */
my_state = states[(this_cpu >> 8) & 0xf];
I_won = vlock_lock(my_state, this_cpu & 0xf));
if (I_won) {
/* and so on */
I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
if (I_won) {
/* ... */
}
vlock_unlock(the_whole_country);
}
vlock_unlock(my_state);
}
vlock_unlock(my_town);


ARM implementation
------------------

The current ARM implementation [2] contains some optimisations beyond
the basic algorithm:

* By packing the members of the currently_voting array close together,
we can read the whole array in one transaction (providing the number
of CPUs potentially contending the lock is small enough). This
reduces the number of round-trips required to external memory.

In the ARM implementation, this means that we can use a single load
and comparison:

LDR Rt, [Rn]
CMP Rt, #0

...in place of code equivalent to:

LDRB Rt, [Rn]
CMP Rt, #0
LDRBEQ Rt, [Rn, #1]
CMPEQ Rt, #0
LDRBEQ Rt, [Rn, #2]
CMPEQ Rt, #0
LDRBEQ Rt, [Rn, #3]
CMPEQ Rt, #0

This cuts down on the fast-path latency, as well as potentially
reducing bus contention in contended cases.

The optimisation relies on the fact that the ARM memory system
guarantees coherency between overlapping memory accesses of
different sizes, similarly to many other architectures. Note that
we do not care which element of currently_voting appears in which
bits of Rt, so there is no need to worry about endianness in this
optimisation.

If there are too many CPUs to read the currently_voting array in
one transaction then multiple transations are still required. The
implementation uses a simple loop of word-sized loads for this
case. The number of transactions is still fewer than would be
required if bytes were loaded individually.


In principle, we could aggregate further by using LDRD or LDM, but
to keep the code simple this was not attempted in the initial
implementation.


* vlocks are currently only used to coordinate between CPUs which are
unable to enable their caches yet. This means that the
implementation removes many of the barriers which would be required
when executing the algorithm in cached memory.

packing of the currently_voting array does not work with cached
memory unless all CPUs contending the lock are cache-coherent, due
to cache writebacks from one CPU clobbering values written by other
CPUs. (Though if all the CPUs are cache-coherent, you should be
probably be using proper spinlocks instead anyway).


* The "no votes yet" value used for the last_vote variable is 0 (not
-1 as in the pseudocode). This allows statically-allocated vlocks
to be implicitly initialised to an unlocked state simply by putting
them in .bss.

An offset is added to each CPU's ID for the purpose of setting this
variable, so that no CPU uses the value 0 for its ID.


Colophon
--------

Originally created and documented by Dave Martin for Linaro Limited, for
use in ARM-based big.LITTLE platforms, with review and input gratefully
received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for
grabbing most of this text out of the relevant mail thread and writing
up the pseudocode.

Copyright (C) 2012-2013 Linaro Limited
Distributed under the terms of Version 2 of the GNU General Public
License, as defined in linux/COPYING.


References
----------

[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
Problem", Communications of the ACM 17, 8 (August 1974), 453-455.

http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm

[2] linux/arch/arm/common/vlock.S, www.kernel.org.
12 changes: 11 additions & 1 deletion arch/arm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
select HAVE_CONTEXT_TRACKING
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
Expand Down Expand Up @@ -1479,6 +1480,14 @@ config HAVE_ARM_TWD
help
This options enables support for the ARM timer and watchdog unit

config MCPM
bool "Multi-Cluster Power Management"
depends on CPU_V7 && SMP
help
This option provides the common power management infrastructure
for (multi-)cluster based systems, such as big.LITTLE based
systems.

choice
prompt "Memory split"
default VMSPLIT_3G
Expand Down Expand Up @@ -1565,8 +1574,9 @@ config SCHED_HRTICK
def_bool HIGH_RES_TIMERS

config THUMB2_KERNEL
bool "Compile the kernel in Thumb-2 mode"
bool "Compile the kernel in Thumb-2 mode" if !CPU_THUMBONLY
depends on CPU_V7 && !CPU_V6 && !CPU_V6K
default y if CPU_THUMBONLY
select AEABI
select ARM_ASM_UNIFIED
select ARM_UNWIND
Expand Down
11 changes: 11 additions & 0 deletions arch/arm/Kconfig.debug
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,17 @@ config DEBUG_LL_INCLUDE
default "debug/zynq.S" if DEBUG_ZYNQ_UART0 || DEBUG_ZYNQ_UART1
default "mach/debug-macro.S"

config DEBUG_UNCOMPRESS
bool
default y if ARCH_MULTIPLATFORM && DEBUG_LL && \
!DEBUG_OMAP2PLUS_UART && \
!DEBUG_TEGRA_UART

config UNCOMPRESS_INCLUDE
string
default "debug/uncompress.h" if ARCH_MULTIPLATFORM
default "mach/uncompress.h"

config EARLY_PRINTK
bool "Early printk"
depends on DEBUG_LL
Expand Down
3 changes: 3 additions & 0 deletions arch/arm/boot/compressed/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ endif
AFLAGS_head.o += -DTEXT_OFFSET=$(TEXT_OFFSET)
HEAD = head.o
OBJS += misc.o decompress.o
ifeq ($(CONFIG_DEBUG_UNCOMPRESS),y)
OBJS += debug.o
endif
FONTC = $(srctree)/drivers/video/console/font_acorn_8x8.c

# string library code (-Os is enforced to keep it much smaller)
Expand Down
12 changes: 12 additions & 0 deletions arch/arm/boot/compressed/debug.S
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include <linux/linkage.h>
#include <asm/assembler.h>

#include CONFIG_DEBUG_LL_INCLUDE

ENTRY(putc)
addruart r1, r2, r3
waituart r3, r1
senduart r0, r1
busyuart r3, r1
mov pc, lr
ENDPROC(putc)
8 changes: 1 addition & 7 deletions arch/arm/boot/compressed/misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,7 @@ unsigned int __machine_arch_type;
static void putstr(const char *ptr);
extern void error(char *x);

#ifdef CONFIG_ARCH_MULTIPLATFORM
static inline void putc(int c) {}
static inline void flush(void) {}
static inline void arch_decomp_setup(void) {}
#else
#include <mach/uncompress.h>
#endif
#include CONFIG_UNCOMPRESS_INCLUDE

#ifdef CONFIG_DEBUG_ICEDCC

Expand Down
3 changes: 3 additions & 0 deletions arch/arm/common/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ obj-$(CONFIG_SHARP_PARAM) += sharpsl_param.o
obj-$(CONFIG_SHARP_SCOOP) += scoop.o
obj-$(CONFIG_PCI_HOST_ITE8152) += it8152.o
obj-$(CONFIG_ARM_TIMER_SP804) += timer-sp.o
obj-$(CONFIG_MCPM) += mcpm_head.o mcpm_entry.o mcpm_platsmp.o vlock.o
AFLAGS_mcpm_head.o := -march=armv7-a
AFLAGS_vlock.o := -march=armv7-a
Loading

0 comments on commit 8546dc1

Please sign in to comment.