forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux…
…/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits) dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm mpath: make it easier to detect unintended I/O request flushes dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH dm: introduce enum dm_queue_mode to cleanup related code dm mpath: verify __pg_init_all_paths locking assumptions at runtime dm: verify suspend_locking assumptions at runtime dm block manager: remove an unused argument from dm_block_manager_create() dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() dm mpath: delay requeuing while path initialization is in progress dm mpath: avoid that path removal can trigger an infinite loop dm mpath: split and rename activate_path() to prepare for its expanded use dm ioctl: prevent stack leak in dm ioctl call dm integrity: use previously calculated log2 of sectors_per_block dm integrity: use hex2bin instead of open-coded variant dm crypt: replace custom implementation of hex2bin() dm crypt: remove obsolete references to per-CPU state dm verity: switch to using asynchronous hash crypto API dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues ...
- Loading branch information
Showing
45 changed files
with
7,640 additions
and
3,025 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
The dm-integrity target emulates a block device that has additional | ||
per-sector tags that can be used for storing integrity information. | ||
|
||
A general problem with storing integrity tags with every sector is that | ||
writing the sector and the integrity tag must be atomic - i.e. in case of | ||
crash, either both sector and integrity tag or none of them is written. | ||
|
||
To guarantee write atomicity, the dm-integrity target uses journal, it | ||
writes sector data and integrity tags into a journal, commits the journal | ||
and then copies the data and integrity tags to their respective location. | ||
|
||
The dm-integrity target can be used with the dm-crypt target - in this | ||
situation the dm-crypt target creates the integrity data and passes them | ||
to the dm-integrity target via bio_integrity_payload attached to the bio. | ||
In this mode, the dm-crypt and dm-integrity targets provide authenticated | ||
disk encryption - if the attacker modifies the encrypted device, an I/O | ||
error is returned instead of random data. | ||
|
||
The dm-integrity target can also be used as a standalone target, in this | ||
mode it calculates and verifies the integrity tag internally. In this | ||
mode, the dm-integrity target can be used to detect silent data | ||
corruption on the disk or in the I/O path. | ||
|
||
|
||
When loading the target for the first time, the kernel driver will format | ||
the device. But it will only format the device if the superblock contains | ||
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity | ||
target can't be loaded. | ||
|
||
To use the target for the first time: | ||
1. overwrite the superblock with zeroes | ||
2. load the dm-integrity target with one-sector size, the kernel driver | ||
will format the device | ||
3. unload the dm-integrity target | ||
4. read the "provided_data_sectors" value from the superblock | ||
5. load the dm-integrity target with the the target size | ||
"provided_data_sectors" | ||
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target | ||
with the size "provided_data_sectors" | ||
|
||
|
||
Target arguments: | ||
|
||
1. the underlying block device | ||
|
||
2. the number of reserved sector at the beginning of the device - the | ||
dm-integrity won't read of write these sectors | ||
|
||
3. the size of the integrity tag (if "-" is used, the size is taken from | ||
the internal-hash algorithm) | ||
|
||
4. mode: | ||
D - direct writes (without journal) - in this mode, journaling is | ||
not used and data sectors and integrity tags are written | ||
separately. In case of crash, it is possible that the data | ||
and integrity tag doesn't match. | ||
J - journaled writes - data and integrity tags are written to the | ||
journal and atomicity is guaranteed. In case of crash, | ||
either both data and tag or none of them are written. The | ||
journaled mode degrades write throughput twice because the | ||
data have to be written twice. | ||
R - recovery mode - in this mode, journal is not replayed, | ||
checksums are not checked and writes to the device are not | ||
allowed. This mode is useful for data recovery if the | ||
device cannot be activated in any of the other standard | ||
modes. | ||
|
||
5. the number of additional arguments | ||
|
||
Additional arguments: | ||
|
||
journal_sectors:number | ||
The size of journal, this argument is used only if formatting the | ||
device. If the device is already formatted, the value from the | ||
superblock is used. | ||
|
||
interleave_sectors:number | ||
The number of interleaved sectors. This values is rounded down to | ||
a power of two. If the device is already formatted, the value from | ||
the superblock is used. | ||
|
||
buffer_sectors:number | ||
The number of sectors in one buffer. The value is rounded down to | ||
a power of two. | ||
|
||
The tag area is accessed using buffers, the buffer size is | ||
configurable. The large buffer size means that the I/O size will | ||
be larger, but there could be less I/Os issued. | ||
|
||
journal_watermark:number | ||
The journal watermark in percents. When the size of the journal | ||
exceeds this watermark, the thread that flushes the journal will | ||
be started. | ||
|
||
commit_time:number | ||
Commit time in milliseconds. When this time passes, the journal is | ||
written. The journal is also written immediatelly if the FLUSH | ||
request is received. | ||
|
||
internal_hash:algorithm(:key) (the key is optional) | ||
Use internal hash or crc. | ||
When this argument is used, the dm-integrity target won't accept | ||
integrity tags from the upper target, but it will automatically | ||
generate and verify the integrity tags. | ||
|
||
You can use a crc algorithm (such as crc32), then integrity target | ||
will protect the data against accidental corruption. | ||
You can also use a hmac algorithm (for example | ||
"hmac(sha256):0123456789abcdef"), in this mode it will provide | ||
cryptographic authentication of the data without encryption. | ||
|
||
When this argument is not used, the integrity tags are accepted | ||
from an upper layer target, such as dm-crypt. The upper layer | ||
target should check the validity of the integrity tags. | ||
|
||
journal_crypt:algorithm(:key) (the key is optional) | ||
Encrypt the journal using given algorithm to make sure that the | ||
attacker can't read the journal. You can use a block cipher here | ||
(such as "cbc(aes)") or a stream cipher (for example "chacha20", | ||
"salsa20", "ctr(aes)" or "ecb(arc4)"). | ||
|
||
The journal contains history of last writes to the block device, | ||
an attacker reading the journal could see the last sector nubmers | ||
that were written. From the sector numbers, the attacker can infer | ||
the size of files that were written. To protect against this | ||
situation, you can encrypt the journal. | ||
|
||
journal_mac:algorithm(:key) (the key is optional) | ||
Protect sector numbers in the journal from accidental or malicious | ||
modification. To protect against accidental modification, use a | ||
crc algorithm, to protect against malicious modification, use a | ||
hmac algorithm with a key. | ||
|
||
This option is not needed when using internal-hash because in this | ||
mode, the integrity of journal entries is checked when replaying | ||
the journal. Thus, modified sector number would be detected at | ||
this stage. | ||
|
||
block_size:number | ||
The size of a data block in bytes. The larger the block size the | ||
less overhead there is for per-block integrity metadata. | ||
Supported values are 512, 1024, 2048 and 4096 bytes. If not | ||
specified the default block size is 512 bytes. | ||
|
||
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can | ||
be changed when reloading the target (load an inactive table and swap the | ||
tables with suspend and resume). The other arguments should not be changed | ||
when reloading the target because the layout of disk data depend on them | ||
and the reloaded target would be non-functional. | ||
|
||
|
||
The layout of the formatted block device: | ||
* reserved sectors (they are not used by this target, they can be used for | ||
storing LUKS metadata or for other purpose), the size of the reserved | ||
area is specified in the target arguments | ||
* superblock (4kiB) | ||
* magic string - identifies that the device was formatted | ||
* version | ||
* log2(interleave sectors) | ||
* integrity tag size | ||
* the number of journal sections | ||
* provided data sectors - the number of sectors that this target | ||
provides (i.e. the size of the device minus the size of all | ||
metadata and padding). The user of this target should not send | ||
bios that access data beyond the "provided data sectors" limit. | ||
* flags - a flag is set if journal_mac is used | ||
* journal | ||
The journal is divided into sections, each section contains: | ||
* metadata area (4kiB), it contains journal entries | ||
every journal entry contains: | ||
* logical sector (specifies where the data and tag should | ||
be written) | ||
* last 8 bytes of data | ||
* integrity tag (the size is specified in the superblock) | ||
every metadata sector ends with | ||
* mac (8-bytes), all the macs in 8 metadata sectors form a | ||
64-byte value. It is used to store hmac of sector | ||
numbers in the journal section, to protect against a | ||
possibility that the attacker tampers with sector | ||
numbers in the journal. | ||
* commit id | ||
* data area (the size is variable; it depends on how many journal | ||
entries fit into the metadata area) | ||
every sector in the data area contains: | ||
* data (504 bytes of data, the last 8 bytes are stored in | ||
the journal entry) | ||
* commit id | ||
To test if the whole journal section was written correctly, every | ||
512-byte sector of the journal ends with 8-byte commit id. If the | ||
commit id matches on all sectors in a journal section, then it is | ||
assumed that the section was written correctly. If the commit id | ||
doesn't match, the section was written partially and it should not | ||
be replayed. | ||
* one or more runs of interleaved tags and data. Each run contains: | ||
* tag area - it contains integrity tags. There is one tag for each | ||
sector in the data area | ||
* data area - it contains data sectors. The number of data sectors | ||
in one run must be a power of two. log2 of this value is stored | ||
in the superblock. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.