From 172c238612ebf81cabccc86b788c9209af591f61 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Fri, 6 Nov 2015 10:53:01 -0500 Subject: [PATCH 1/6] dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition A thin-pool that is in out-of-data-space (OODS) mode may transition back to write mode -- without the admin adding more space to the thin-pool -- if/when blocks are released (either by deleting thin devices or discarding provisioned blocks). But as part of the thin-pool's earlier transition to out-of-data-space mode the thin-pool may have set the 'error_if_no_space' flag to true if the no_space_timeout expires without more space having been made available. That implementation detail, of changing the pool's error_if_no_space setting, needs to be reset back to the default that the user specified when the thin-pool's table was loaded. Otherwise we'll drop the user requested behaviour on the floor when this out-of-data-space to write mode transition occurs. Reported-by: Vivek Goyal Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Fixes: 2c43fd26e4 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released") Cc: stable@vger.kernel.org --- drivers/md/dm-thin.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 3897b90bd462d8..9f0b94fad36137 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2432,6 +2432,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) case PM_WRITE: if (old_mode != new_mode) notify_of_pool_mode_change(pool, "write"); + pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space; dm_pool_metadata_read_write(pool->pmd); pool->process_bio = process_bio; pool->process_discard = process_discard_bio; From 5bbbfdf685657771fda05b926b28ca0f79163a28 Mon Sep 17 00:00:00 2001 From: Junichi Nomura Date: Tue, 17 Nov 2015 09:39:26 +0000 Subject: [PATCH 2/6] dm: fix ioctl retry termination with signal dm-mpath retries ioctl, when no path is readily available and the device is configured to queue I/O in such a case. If you want to stop the retry before multipathd decides to turn off queueing mode, you could send signal for the process to exit from the loop. However the check of fatal signal has not carried along when commit 6c182cd88d17 ("dm mpath: fix ioctl deadlock when no paths") moved the loop from dm-mpath to dm core. As a result, we can't terminate such a process in the retry loop. Easy reproducer of the situation is: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # dmsetup message mp 0 'queue_if_no_path' # sg_inq /dev/mapper/mp then you should be able to terminate sg_inq by pressing Ctrl+C. Fixes: 6c182cd88d17 ("dm mpath: fix ioctl deadlock when no paths") Signed-off-by: Jun'ichi Nomura Cc: Hannes Reinecke Cc: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org --- drivers/md/dm-mpath.c | 2 +- drivers/md/dm.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index aaa6caa46a9f2d..3d1829237678d2 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1562,7 +1562,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, spin_unlock_irqrestore(&m->lock, flags); - if (r == -ENOTCONN && !fatal_signal_pending(current)) { + if (r == -ENOTCONN) { spin_lock_irqsave(&m->lock, flags); if (!m->current_pg) { /* Path status changed, redo selection */ diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 6e15f3565892fc..fabd5d8fd55944 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -591,7 +591,7 @@ static int dm_get_live_table_for_ioctl(struct mapped_device *md, out: dm_put_live_table(md, *srcu_idx); - if (r == -ENOTCONN) { + if (r == -ENOTCONN && !fatal_signal_pending(current)) { msleep(10); goto retry; } From 647a20d5cad7477033bc021ec9dd75edf4bbf9a0 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Tue, 17 Nov 2015 14:11:34 -0500 Subject: [PATCH 3/6] dm: do not reuse dm_blk_ioctl block_device input as local variable (Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for targets' .prepare_ioctl to fail if they go on to check the bdev for !NULL. Fixes: e56f81e0b01e ("dm: refactor ioctl handling") Reported-by: Junichi Nomura Signed-off-by: Mike Snitzer --- drivers/md/dm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index fabd5d8fd55944..5df40480228b7a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -603,9 +603,10 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode, { struct mapped_device *md = bdev->bd_disk->private_data; struct dm_target *tgt; + struct block_device *tgt_bdev = NULL; int srcu_idx, r; - r = dm_get_live_table_for_ioctl(md, &tgt, &bdev, &mode, &srcu_idx); + r = dm_get_live_table_for_ioctl(md, &tgt, &tgt_bdev, &mode, &srcu_idx); if (r < 0) return r; @@ -620,7 +621,7 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode, goto out; } - r = __blkdev_driver_ioctl(bdev, mode, cmd, arg); + r = __blkdev_driver_ioctl(tgt_bdev, mode, cmd, arg); out: dm_put_live_table(md, srcu_idx); return r; From 43e43c9ea60a7a1831ec823773e924d2dadefd44 Mon Sep 17 00:00:00 2001 From: Junichi Nomura Date: Tue, 17 Nov 2015 09:36:56 +0000 Subject: [PATCH 4/6] dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path In multipath_prepare_ioctl(), - pgpath is a path selected from available paths - m->queue_io is true if we cannot send a request immediately to paths, either because: * there is no available path * the path group needs activation (pg_init) - pg_init is not started - pg_init is still running - m->queue_if_no_path is true if the device is configured to queue I/O if there are no available paths If !pgpath && !m->queue_if_no_path, the handler should return -EIO. However in the course of refactoring the condition check has broken and returns success in that case. Since bdev points to the dm device itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and recurses until crash. You could reproduce the problem like this: # dmsetup create mp --table '0 1024 multipath 0 0 0 0' # sg_inq /dev/mapper/mp [ 172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268 [ 172.662843] PGD 19dd067 PUD 0 [ 172.666269] Thread overran stack, or stack corrupted [ 172.671808] Oops: 0000 [#1] SMP ... Fix the condition check with some clarifications. Fixes: e56f81e0b01e ("dm: refactor ioctl handling") Signed-off-by: Jun'ichi Nomura Cc: Christoph Hellwig Cc: Mike Snitzer Signed-off-by: Mike Snitzer --- drivers/md/dm-mpath.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 3d1829237678d2..cfa29f574c2a9e 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1537,29 +1537,31 @@ static int multipath_prepare_ioctl(struct dm_target *ti, struct block_device **bdev, fmode_t *mode) { struct multipath *m = ti->private; - struct pgpath *pgpath; unsigned long flags; int r; - r = 0; - spin_lock_irqsave(&m->lock, flags); if (!m->current_pgpath) __choose_pgpath(m, 0); - pgpath = m->current_pgpath; - - if (pgpath) { - *bdev = pgpath->path.dev->bdev; - *mode = pgpath->path.dev->mode; + if (m->current_pgpath) { + if (!m->queue_io) { + *bdev = m->current_pgpath->path.dev->bdev; + *mode = m->current_pgpath->path.dev->mode; + r = 0; + } else { + /* pg_init has not started or completed */ + r = -ENOTCONN; + } + } else { + /* No path is available */ + if (m->queue_if_no_path) + r = -ENOTCONN; + else + r = -EIO; } - if ((pgpath && m->queue_io) || (!pgpath && m->queue_if_no_path)) - r = -ENOTCONN; - else if (!*bdev) - r = -EIO; - spin_unlock_irqrestore(&m->lock, flags); if (r == -ENOTCONN) { From bcbd94ff481ec1d7b5c824d90df82d0faafabd35 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Thu, 19 Nov 2015 07:36:50 -0500 Subject: [PATCH 5/6] dm crypt: fix a possible hang due to race condition on exit A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE), __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop(). It is possible that the processor reorders memory accesses so that kthread_should_stop() is executed before __set_current_state(). If such reordering happens, there is a possible race on thread termination: CPU 0: calls kthread_should_stop() it tests KTHREAD_SHOULD_STOP bit, returns false CPU 1: calls kthread_stop(cc->write_thread) sets the KTHREAD_SHOULD_STOP bit calls wake_up_process on the kernel thread, that sets the thread state to TASK_RUNNING CPU 0: sets __set_current_state(TASK_INTERRUPTIBLE) spin_unlock_irq(&cc->write_thread_wait.lock) schedule() - and the process is stuck and never terminates, because the state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already terminated Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to signal that the kernel thread should exit. The flag is set and tested while holding cc->write_thread_wait.lock, so there is no possibility of racy access to the flag. Also, remove the unnecessary set_task_state(current, TASK_RUNNING) following the schedule() call. When the process was woken up, its state was already set to TASK_RUNNING. Other kernel code also doesn't set the state to TASK_RUNNING following schedule() (for example, do_wait_for_common in completion.c doesn't do it). Fixes: dc2676210c42 ("dm crypt: offload writes to thread") Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Mike Snitzer --- drivers/md/dm-crypt.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 917d47e290ae08..3147c8d09ea84a 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -112,7 +112,8 @@ struct iv_tcw_private { * and encrypts / decrypts at the same time. */ enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID, - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD }; + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, + DM_CRYPT_EXIT_THREAD}; /* * The fields in here must be read only after initialization. @@ -1203,20 +1204,18 @@ static int dmcrypt_write(void *data) if (!RB_EMPTY_ROOT(&cc->write_tree)) goto pop_from_list; + if (unlikely(test_bit(DM_CRYPT_EXIT_THREAD, &cc->flags))) { + spin_unlock_irq(&cc->write_thread_wait.lock); + break; + } + __set_current_state(TASK_INTERRUPTIBLE); __add_wait_queue(&cc->write_thread_wait, &wait); spin_unlock_irq(&cc->write_thread_wait.lock); - if (unlikely(kthread_should_stop())) { - set_task_state(current, TASK_RUNNING); - remove_wait_queue(&cc->write_thread_wait, &wait); - break; - } - schedule(); - set_task_state(current, TASK_RUNNING); spin_lock_irq(&cc->write_thread_wait.lock); __remove_wait_queue(&cc->write_thread_wait, &wait); goto continue_locked; @@ -1531,8 +1530,13 @@ static void crypt_dtr(struct dm_target *ti) if (!cc) return; - if (cc->write_thread) + if (cc->write_thread) { + spin_lock_irq(&cc->write_thread_wait.lock); + set_bit(DM_CRYPT_EXIT_THREAD, &cc->flags); + wake_up_locked(&cc->write_thread_wait); + spin_unlock_irq(&cc->write_thread_wait.lock); kthread_stop(cc->write_thread); + } if (cc->io_queue) destroy_workqueue(cc->io_queue); From 0fcb04d59351f790efb8da18edefd6ab4d9bbf3b Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Mon, 23 Nov 2015 13:44:38 -0500 Subject: [PATCH 6/6] dm thin: fix regression in advertised discard limits When establishing a thin device's discard limits we cannot rely on the underlying thin-pool device's discard capabilities (which are inherited from the thin-pool's underlying data device) given that DM thin devices must provide discard support even when the thin-pool's underlying data device doesn't support discards. Users were exposed to this thin device discard limits regression if their thin-pool's underlying data device does _not_ support discards. This regression caused all upper-layers that called the blkdev_issue_discard() interface to not be able to issue discards to thin devices (because discard_granularity was 0). This regression wasn't caught earlier because the device-mapper-test-suite's extensive 'thin-provisioning' discard tests are only ever performed against thin-pool's with data devices that support discards. Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature rather than inferring whether or not a thin device's discard support should be enabled by looking at the thin-pool's discard_granularity. Fixes: 216076705 ("dm thin: disable discard support for thin devices if pool's is disabled") Reported-by: Mike Gerber Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org # 4.1+ --- drivers/md/dm-thin.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 9f0b94fad36137..63903a5a5d9ee3 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -4250,10 +4250,9 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits) { struct thin_c *tc = ti->private; struct pool *pool = tc->pool; - struct queue_limits *pool_limits = dm_get_queue_limits(pool->pool_md); - if (!pool_limits->discard_granularity) - return; /* pool's discard support is disabled */ + if (!pool->pf.discard_enabled) + return; limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */