Skip to content

Commit

Permalink
Merge tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/…
Browse files Browse the repository at this point in the history
…kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Update DM crypt to allocate compound pages if possible

 - Fix DM crypt target's crypt_ctr_cipher_new return value on invalid
   AEAD cipher

 - Fix DM flakey testing target's write bio corruption feature to
   corrupt the data of a cloned bio instead of the original

 - Add random_read_corrupt and random_write_corrupt features to DM
   flakey target

 - Fix ABBA deadlock in DM thin metadata by resetting associated bufio
   client rather than destroying and recreating it

 - A couple other small DM thinp cleanups

 - Update DM core to support disabling block core IO stats accounting
   and optimize away code that isn't needed if stats are disabled

 - Other small DM core cleanups

 - Improve DM integrity target to not require so much memory on 32 bit
   systems. Also only allocate the recalculate buffer as needed (and
   increasingly reduce its size on allocation failure)

 - Update DM integrity to use %*ph for printing hexdump of a small
   buffer. Also update DM integrity documentation

 - Various DM core ioctl interface hardening. Now more careful about
   alignment of structures and processing of input passed to the kernel
   from userspace.

   Also disallow the creation of DM devices named "control", "." or ".."

 - Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM
   core's ioctl and bufio code

* tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
  dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc
  dm integrity: scale down the recalculate buffer if memory allocation fails
  dm integrity: only allocate recalculate buffer when needed
  dm integrity: reduce vmalloc space footprint on 32-bit architectures
  dm ioctl: Refuse to create device named "." or ".."
  dm ioctl: Refuse to create device named "control"
  dm ioctl: Avoid double-fetch of version
  dm ioctl: structs and parameter strings must not overlap
  dm ioctl: Avoid pointer arithmetic overflow
  dm ioctl: Check dm_target_spec is sufficiently aligned
  Documentation: dm-integrity: Document an example of how the tunables relate.
  Documentation: dm-integrity: Document default values.
  Documentation: dm-integrity: Document the meaning of "buffer".
  Documentation: dm-integrity: Fix minor grammatical error.
  dm integrity: Use %*ph for printing hexdump of a small buffer
  dm thin: disable discards for thin-pool if no_discard_passdown
  dm: remove stale/redundant dm_internal_{suspend,resume} prototypes in dm.h
  dm: skip dm-stats work in alloc_io() unless needed
  dm: avoid needless dm_io access if all IO accounting is disabled
  dm: support turning off block-core's io stats accounting
  ...
  • Loading branch information
torvalds committed Jun 30, 2023
2 parents ca7ce08 + e2c789c commit 6cdbb09
Show file tree
Hide file tree
Showing 18 changed files with 478 additions and 236 deletions.
10 changes: 10 additions & 0 deletions Documentation/admin-guide/device-mapper/dm-flakey.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,16 @@ Optional feature parameters:
Perform the replacement only if bio->bi_opf has all the
selected flags set.

random_read_corrupt <probability>
During <down interval>, replace random byte in a read bio
with a random value. probability is an integer between
0 and 1000000000 meaning 0% to 100% probability of corruption.

random_write_corrupt <probability>
During <down interval>, replace random byte in a write bio
with a random value. probability is an integer between
0 and 1000000000 meaning 0% to 100% probability of corruption.

Examples:

Replaces the 32nd byte of READ bios with the value 1::
Expand Down
43 changes: 27 additions & 16 deletions Documentation/admin-guide/device-mapper/dm-integrity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.

There's an alternate mode of operation where dm-integrity uses bitmap
There's an alternate mode of operation where dm-integrity uses a bitmap
instead of a journal. If a bit in the bitmap is 1, the corresponding
region's data and integrity tags are not synchronized - if the machine
crashes, the unsynchronized regions will be recalculated. The bitmap mode
Expand All @@ -38,6 +38,15 @@ the device. But it will only format the device if the superblock contains
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
target can't be loaded.

Accesses to the on-disk metadata area containing checksums (aka tags) are
buffered using dm-bufio. When an access to any given metadata area
occurs, each unique metadata area gets its own buffer(s). The buffer size
is capped at the size of the metadata area, but may be smaller, thereby
requiring multiple buffers to represent the full metadata area. A smaller
buffer size will produce a smaller resulting read/write operation to the
metadata area for small reads/writes. The metadata is still read even in
a full write to the data covered by a single buffer.

To use the target for the first time:

1. overwrite the superblock with zeroes
Expand Down Expand Up @@ -93,7 +102,7 @@ journal_sectors:number
device. If the device is already formatted, the value from the
superblock is used.

interleave_sectors:number
interleave_sectors:number (default 32768)
The number of interleaved sectors. This values is rounded down to
a power of two. If the device is already formatted, the value from
the superblock is used.
Expand All @@ -102,20 +111,16 @@ meta_device:device
Don't interleave the data and metadata on the device. Use a
separate device for metadata.

buffer_sectors:number
The number of sectors in one buffer. The value is rounded down to
a power of two.

The tag area is accessed using buffers, the buffer size is
configurable. The large buffer size means that the I/O size will
be larger, but there could be less I/Os issued.
buffer_sectors:number (default 128)
The number of sectors in one metadata buffer. The value is rounded
down to a power of two.

journal_watermark:number
journal_watermark:number (default 50)
The journal watermark in percents. When the size of the journal
exceeds this watermark, the thread that flushes the journal will
be started.

commit_time:number
commit_time:number (default 10000)
Commit time in milliseconds. When this time passes, the journal is
written. The journal is also written immediately if the FLUSH
request is received.
Expand Down Expand Up @@ -163,11 +168,10 @@ journal_mac:algorithm(:key) (the key is optional)
the journal. Thus, modified sector number would be detected at
this stage.

block_size:number
The size of a data block in bytes. The larger the block size the
block_size:number (default 512)
The size of a data block in bytes. The larger the block size the
less overhead there is for per-block integrity metadata.
Supported values are 512, 1024, 2048 and 4096 bytes. If not
specified the default block size is 512 bytes.
Supported values are 512, 1024, 2048 and 4096 bytes.

sectors_per_bit:number
In the bitmap mode, this parameter specifies the number of
Expand Down Expand Up @@ -209,6 +213,12 @@ table and swap the tables with suspend and resume). The other arguments
should not be changed when reloading the target because the layout of disk
data depend on them and the reloaded target would be non-functional.

For example, on a device using the default interleave_sectors of 32768, a
block_size of 512, and an internal_hash of crc32c with a tag size of 4
bytes, it will take 128 KiB of tags to track a full data area, requiring
256 sectors of metadata per data area. With the default buffer_sectors of
128, that means there will be 2 buffers per metadata area, or 2 buffers
per 16 MiB of data.

Status line:

Expand Down Expand Up @@ -286,7 +296,8 @@ The layout of the formatted block device:
Each run contains:

* tag area - it contains integrity tags. There is one tag for each
sector in the data area
sector in the data area. The size of this area is always 4KiB or
greater.
* data area - it contains data sectors. The number of data sectors
in one run must be a power of two. log2 of this value is stored
in the superblock.
24 changes: 7 additions & 17 deletions drivers/md/dm-bufio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1157,23 +1157,6 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,

*data_mode = DATA_MODE_VMALLOC;

/*
* __vmalloc allocates the data pages and auxiliary structures with
* gfp_flags that were specified, but pagetables are always allocated
* with GFP_KERNEL, no matter what was specified as gfp_mask.
*
* Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
* all allocations done by this process (including pagetables) are done
* as if GFP_NOIO was specified.
*/
if (gfp_mask & __GFP_NORETRY) {
unsigned int noio_flag = memalloc_noio_save();
void *ptr = __vmalloc(c->block_size, gfp_mask);

memalloc_noio_restore(noio_flag);
return ptr;
}

return __vmalloc(c->block_size, gfp_mask);
}

Expand Down Expand Up @@ -2592,6 +2575,13 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)
}
EXPORT_SYMBOL_GPL(dm_bufio_client_destroy);

void dm_bufio_client_reset(struct dm_bufio_client *c)
{
drop_buffers(c);
flush_work(&c->shrink_work);
}
EXPORT_SYMBOL_GPL(dm_bufio_client_reset);

void dm_bufio_set_sector_offset(struct dm_bufio_client *c, sector_t start)
{
c->start = start;
Expand Down
3 changes: 2 additions & 1 deletion drivers/md/dm-core.h
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,8 @@ struct dm_io {
*/
enum {
DM_IO_ACCOUNTED,
DM_IO_WAS_SPLIT
DM_IO_WAS_SPLIT,
DM_IO_BLK_STAT
};

static inline bool dm_io_flagged(struct dm_io *io, unsigned int bit)
Expand Down
51 changes: 36 additions & 15 deletions drivers/md/dm-crypt.c
Original file line number Diff line number Diff line change
Expand Up @@ -1661,15 +1661,18 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
* In order to not degrade performance with excessive locking, we try
* non-blocking allocations without a mutex first but on failure we fallback
* to blocking allocations with a mutex.
*
* In order to reduce allocation overhead, we try to allocate compound pages in
* the first pass. If they are not available, we fall back to the mempool.
*/
static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)
{
struct crypt_config *cc = io->cc;
struct bio *clone;
unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM;
unsigned int i, len, remaining_size;
struct page *page;
unsigned int remaining_size;
unsigned int order = MAX_ORDER - 1;

retry:
if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM))
Expand All @@ -1682,19 +1685,34 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)

remaining_size = size;

for (i = 0; i < nr_iovecs; i++) {
page = mempool_alloc(&cc->page_pool, gfp_mask);
if (!page) {
while (remaining_size) {
struct page *pages;
unsigned size_to_add;
unsigned remaining_order = __fls((remaining_size + PAGE_SIZE - 1) >> PAGE_SHIFT);
order = min(order, remaining_order);

while (order > 0) {
pages = alloc_pages(gfp_mask
| __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_COMP,
order);
if (likely(pages != NULL))
goto have_pages;
order--;
}

pages = mempool_alloc(&cc->page_pool, gfp_mask);
if (!pages) {
crypt_free_buffer_pages(cc, clone);
bio_put(clone);
gfp_mask |= __GFP_DIRECT_RECLAIM;
order = 0;
goto retry;
}

len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;

__bio_add_page(clone, page, len, 0);
remaining_size -= len;
have_pages:
size_to_add = min((unsigned)PAGE_SIZE << order, remaining_size);
__bio_add_page(clone, pages, size_to_add, 0);
remaining_size -= size_to_add;
}

/* Allocate space for integrity tags */
Expand All @@ -1712,12 +1730,15 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)

static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
{
struct bio_vec *bv;
struct bvec_iter_all iter_all;
struct folio_iter fi;

bio_for_each_segment_all(bv, clone, iter_all) {
BUG_ON(!bv->bv_page);
mempool_free(bv->bv_page, &cc->page_pool);
if (clone->bi_vcnt > 0) { /* bio_for_each_folio_all crashes with an empty bio */
bio_for_each_folio_all(fi, clone) {
if (folio_test_large(fi.folio))
folio_put(fi.folio);
else
mempool_free(&fi.folio->page, &cc->page_pool);
}
}
}

Expand Down Expand Up @@ -2887,7 +2908,7 @@ static int crypt_ctr_cipher_new(struct dm_target *ti, char *cipher_in, char *key
ret = crypt_ctr_auth_cipher(cc, cipher_api);
if (ret < 0) {
ti->error = "Invalid AEAD cipher spec";
return -ENOMEM;
return ret;
}
}

Expand Down
Loading

0 comments on commit 6cdbb09

Please sign in to comment.