Skip to content

Commit

Permalink
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/linux-2.6-tip

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits)
  sched: fix RCU lockdep splat from task_group()
  rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value
  sched: suppress RCU lockdep splat in task_fork_fair
  net: suppress RCU lockdep false positive in sock_update_classid
  rcu: move check from rcu_dereference_bh to rcu_read_lock_bh_held
  rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameter
  rcu: Add tracing data to support queueing models
  rcu: fix sparse errors in rcutorture.c
  rcu: only one evaluation of arg in rcu_dereference_check() unless sparse
  kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC
  rcu: fix _oddness handling of verbose stall warnings
  rcu: performance fixes to TINY_PREEMPT_RCU callback checking
  rcu: upgrade stallwarn.txt documentation for CPU-bound RT processes
  vhost: add __rcu annotations
  rcu: add comment stating that list_empty() applies to RCU-protected lists
  rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU
  rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU
  rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods
  rcu: document ways of stalling updates in low-memory situations
  rcu: repair code-duplication FIXMEs
  ...
  • Loading branch information
torvalds committed Oct 21, 2010
2 parents 31b7eab + 6506cf6 commit 888a6f7
Show file tree
Hide file tree
Showing 60 changed files with 1,454 additions and 474 deletions.
14 changes: 4 additions & 10 deletions Documentation/DocBook/kernel-locking.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
the readers are finished.
all pre-existing readers are finished. Alternatively,
<function>synchronize_rcu()</function> may be used to block until
all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
Expand Down Expand Up @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
- object_put(obj);
+ list_del_rcu(&amp;obj-&gt;list);
cache_num--;
+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu, obj);
+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
}

/* Must be holding cache_lock */
Expand All @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done.
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &amp;cache, list) {
@@ -85,6 +94,7 @@
obj-&gt;popularity = 0;
atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
spin_lock_init(&amp;obj-&gt;lock);
+ INIT_RCU_HEAD(&amp;obj-&gt;rcu);

spin_lock_irqsave(&amp;cache_lock, flags);
__cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
Expand Down
46 changes: 39 additions & 7 deletions Documentation/RCU/checklist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome!
include:

a. Keeping a count of the number of data-structure elements
used by the RCU-protected data structure, including those
waiting for a grace period to elapse. Enforce a limit
on this number, stalling updates as needed to allow
previously deferred frees to complete.

Alternatively, limit only the number awaiting deferred
free rather than the total number of elements.
used by the RCU-protected data structure, including
those waiting for a grace period to elapse. Enforce a
limit on this number, stalling updates as needed to allow
previously deferred frees to complete. Alternatively,
limit only the number awaiting deferred free rather than
the total number of elements.

One way to stall the updates is to acquire the update-side
mutex. (Don't try this with a spinlock -- other CPUs
spinning on the lock could prevent the grace period
from ever ending.) Another way to stall the updates
is for the updates to use a wrapper function around
the memory allocator, so that this wrapper function
simulates OOM when there is too much memory awaiting an
RCU grace period. There are of course many other
variations on this theme.

b. Limiting update rate. For example, if updates occur only
once per hour, then no explicit rate limiting is required,
Expand Down Expand Up @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome!
and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.

17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
the __rcu sparse checks to validate your RCU code. These
can help find problems as follows:

CONFIG_PROVE_RCU: check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
are appropriate.

CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends).

__rcu sparse checks: tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the
variants of rcu_dereference().

These debugging aids can help you find problems that are
otherwise extremely difficult to spot.
18 changes: 18 additions & 0 deletions Documentation/RCU/stallwarn.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule().

o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
in which case the next RCU grace period can never complete, which
will eventually cause the system to run out of memory and hang.
While the system is in the process of running itself out of
memory, you might see stall-warning messages.

o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
RCU grace periods from ever completing. Either way, the
system will eventually run out of memory and hang. In the
CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
messages.

o A bug in the RCU implementation.

o A hardware failure. This is quite unlikely, but has occurred
Expand Down
13 changes: 12 additions & 1 deletion Documentation/RCU/trace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.

o "ci" is the number of RCU callbacks that have been invoked for
this CPU. Note that ci+ql is the number of callbacks that have
been registered in absence of CPU-hotplug activity.

o "co" is the number of RCU callbacks that have been orphaned due to
this CPU going offline.

o "ca" is the number of RCU callbacks that have been adopted due to
other CPUs going offline. Note that ci+co-ca+ql is the number of
RCU callbacks registered on this CPU.

There is also an rcu/rcudata.csv file with the same information in
comma-separated-variable spreadsheet format.

Expand Down Expand Up @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s

o "jfq" is the number of jiffies remaining for this grace period
before force_quiescent_state() is invoked to help push things
along. Note that CPUs in dyntick-idle mode thoughout the grace
along. Note that CPUs in dyntick-idle mode throughout the grace
period will not report on their own, but rather must be check by
some other CPU via force_quiescent_state().

Expand Down
2 changes: 1 addition & 1 deletion drivers/input/evdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ struct evdev {
int minor;
struct input_handle handle;
wait_queue_head_t wait;
struct evdev_client *grab;
struct evdev_client __rcu *grab;
struct list_head client_list;
spinlock_t client_lock; /* protects client_list */
struct mutex mutex;
Expand Down
16 changes: 12 additions & 4 deletions drivers/vhost/net.c
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,10 @@ static void handle_tx(struct vhost_net *net)
size_t len, total_len = 0;
int err, wmem;
size_t hdr_size;
struct socket *sock = rcu_dereference(vq->private_data);
struct socket *sock;

sock = rcu_dereference_check(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
return;

Expand Down Expand Up @@ -582,7 +585,10 @@ static void vhost_net_disable_vq(struct vhost_net *n,
static void vhost_net_enable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
struct socket *sock = vq->private_data;
struct socket *sock;

sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (!sock)
return;
if (vq == n->vqs + VHOST_NET_VQ_TX) {
Expand All @@ -598,7 +604,8 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
struct socket *sock;

mutex_lock(&vq->mutex);
sock = vq->private_data;
sock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, NULL);
mutex_unlock(&vq->mutex);
Expand Down Expand Up @@ -736,7 +743,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
}

/* start polling new socket */
oldsock = vq->private_data;
oldsock = rcu_dereference_protected(vq->private_data,
lockdep_is_held(&vq->mutex));
if (sock != oldsock) {
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, sock);
Expand Down
22 changes: 16 additions & 6 deletions drivers/vhost/vhost.c
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
vhost_dev_cleanup(dev);

memory->nregions = 0;
dev->memory = memory;
RCU_INIT_POINTER(dev->memory, memory);
return 0;
}

Expand Down Expand Up @@ -352,8 +352,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
fput(dev->log_file);
dev->log_file = NULL;
/* No one will access memory at this point */
kfree(dev->memory);
dev->memory = NULL;
kfree(rcu_dereference_protected(dev->memory,
lockdep_is_held(&dev->mutex)));
RCU_INIT_POINTER(dev->memory, NULL);
if (dev->mm)
mmput(dev->mm);
dev->mm = NULL;
Expand Down Expand Up @@ -440,14 +441,22 @@ static int vq_access_ok(unsigned int num,
/* Caller should have device mutex but not vq mutex */
int vhost_log_access_ok(struct vhost_dev *dev)
{
return memory_access_ok(dev, dev->memory, 1);
struct vhost_memory *mp;

mp = rcu_dereference_protected(dev->memory,
lockdep_is_held(&dev->mutex));
return memory_access_ok(dev, mp, 1);
}

/* Verify access for write logging. */
/* Caller should have vq mutex and device mutex */
static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user *log_base)
{
return vq_memory_access_ok(log_base, vq->dev->memory,
struct vhost_memory *mp;

mp = rcu_dereference_protected(vq->dev->memory,
lockdep_is_held(&vq->mutex));
return vq_memory_access_ok(log_base, mp,
vhost_has_feature(vq->dev, VHOST_F_LOG_ALL)) &&
(!vq->log_used || log_access_ok(log_base, vq->log_addr,
sizeof *vq->used +
Expand Down Expand Up @@ -487,7 +496,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
kfree(newmem);
return -EFAULT;
}
oldmem = d->memory;
oldmem = rcu_dereference_protected(d->memory,
lockdep_is_held(&d->mutex));
rcu_assign_pointer(d->memory, newmem);
synchronize_rcu();
kfree(oldmem);
Expand Down
10 changes: 7 additions & 3 deletions drivers/vhost/vhost.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ struct vhost_virtqueue {
* vhost_work execution acts instead of rcu_read_lock() and the end of
* vhost_work execution acts instead of rcu_read_lock().
* Writers use virtqueue mutex. */
void *private_data;
void __rcu *private_data;
/* Log write descriptors */
void __user *log_base;
struct vhost_log log[VHOST_NET_MAX_SG];
Expand All @@ -116,7 +116,7 @@ struct vhost_dev {
/* Readers use RCU to access memory table pointer
* log base pointer and features.
* Writers use mutex below.*/
struct vhost_memory *memory;
struct vhost_memory __rcu *memory;
struct mm_struct *mm;
struct mutex mutex;
unsigned acked_features;
Expand Down Expand Up @@ -173,7 +173,11 @@ enum {

static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
{
unsigned acked_features = rcu_dereference(dev->acked_features);
unsigned acked_features;

acked_features =
rcu_dereference_index_check(dev->acked_features,
lockdep_is_held(&dev->mutex));
return acked_features & (1 << bit);
}

Expand Down
4 changes: 2 additions & 2 deletions include/linux/cgroup.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ struct cgroup_subsys_state {

unsigned long flags;
/* ID for this css, if possible */
struct css_id *id;
struct css_id __rcu *id;
};

/* bits in struct cgroup_subsys_state flags field */
Expand Down Expand Up @@ -205,7 +205,7 @@ struct cgroup {
struct list_head children; /* my children */

struct cgroup *parent; /* my parent */
struct dentry *dentry; /* cgroup fs entry, RCU protected */
struct dentry __rcu *dentry; /* cgroup fs entry, RCU protected */

/* Private pointers for each registered subsystem */
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
Expand Down
4 changes: 4 additions & 0 deletions include/linux/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
# define __release(x) __context__(x,-1)
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
#ifdef CONFIG_SPARSE_RCU_POINTER
# define __rcu __attribute__((noderef, address_space(4)))
#else
# define __rcu
#endif
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
#else
Expand Down
2 changes: 1 addition & 1 deletion include/linux/cred.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ struct thread_group_cred {
atomic_t usage;
pid_t tgid; /* thread group process ID */
spinlock_t lock;
struct key *session_keyring; /* keyring inherited over fork */
struct key __rcu *session_keyring; /* keyring inherited over fork */
struct key *process_keyring; /* keyring private to this process */
struct rcu_head rcu; /* RCU deletion hook */
};
Expand Down
6 changes: 3 additions & 3 deletions include/linux/fdtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ struct embedded_fd_set {

struct fdtable {
unsigned int max_fds;
struct file ** fd; /* current fd array */
struct file __rcu **fd; /* current fd array */
fd_set *close_on_exec;
fd_set *open_fds;
struct rcu_head rcu;
Expand All @@ -46,7 +46,7 @@ struct files_struct {
* read mostly part
*/
atomic_t count;
struct fdtable *fdt;
struct fdtable __rcu *fdt;
struct fdtable fdtab;
/*
* written part on a separate cache line in SMP
Expand All @@ -55,7 +55,7 @@ struct files_struct {
int next_fd;
struct embedded_fd_set close_on_exec_init;
struct embedded_fd_set open_fds_init;
struct file * fd_array[NR_OPEN_DEFAULT];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};

#define rcu_dereference_check_fdtable(files, fdtfd) \
Expand Down
2 changes: 1 addition & 1 deletion include/linux/fs.h
Original file line number Diff line number Diff line change
Expand Up @@ -1384,7 +1384,7 @@ struct super_block {
* Saved mount options for lazy filesystems using
* generic_show_options()
*/
char *s_options;
char __rcu *s_options;
};

extern struct timespec current_fs_time(struct super_block *sb);
Expand Down
6 changes: 3 additions & 3 deletions include/linux/genhd.h
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ struct blk_scsi_cmd_filter {
struct disk_part_tbl {
struct rcu_head rcu_head;
int len;
struct hd_struct *last_lookup;
struct hd_struct *part[];
struct hd_struct __rcu *last_lookup;
struct hd_struct __rcu *part[];
};

struct gendisk {
Expand All @@ -149,7 +149,7 @@ struct gendisk {
* non-critical accesses use RCU. Always access through
* helpers.
*/
struct disk_part_tbl *part_tbl;
struct disk_part_tbl __rcu *part_tbl;
struct hd_struct part0;

const struct block_device_operations *fops;
Expand Down
2 changes: 1 addition & 1 deletion include/linux/hardirq.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
#endif

#if defined(CONFIG_NO_HZ)
#if defined(CONFIG_TINY_RCU)
#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
extern void rcu_enter_nohz(void);
extern void rcu_exit_nohz(void);

Expand Down
Loading

0 comments on commit 888a6f7

Please sign in to comment.