-
Notifications
You must be signed in to change notification settings - Fork 0
rwf encoded #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
rwf encoded #1
Conversation
An encoded extent can be up to 128K in length, which exceeds the largest value expressible by the current send stream format's 16 bit tlv_len field. Since encoded writes cannot be split into multiple writes by btrfs send, the send stream format must change to accommodate encoded writes. Supporting this changed format requires retooling how we store the commands we have processed. Since we can no longer use btrfs_tlv_header to describe every attribute, we define a new struct btrfs_send_attribute which has a 32 bit length field, and use that to store the attribute information needed for receive processing. This is transparent to users of the various TLV_GET macros. Signed-off-by: Boris Burkov <boris@bur.io>
I made your suggested changes and force pushed the branch, it looks alright to me in the UI, still re-testing stuff besides "it builds", so apologies if something dumb snuck through. |
btrfs-progs tests are broken on the truncated thing, debugging. |
dumb mistake, checked against sizeof le16*, not sizeof le16. btrfs-progs tests passing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style nits this time around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice this stuff last time, but there's a few inconsistencies with error return values, and some trivial style nits. Otherwise it looks great. I'm going to give this and the kernel patches a heavier round of testing now, thanks!
In send stream v2, write commands can now be an arbitrary size. For that reason, we can no longer allocate a fixed array in sctx for read_cmd. Instead, read_cmd dynamically allocates sctx->read_buf. To avoid needless reallocations, we reuse read_buf between read_cmd calls by also keeping track of the size of the allocated buffer in sctx->read_buf_sz. We do the first allocation of the old default size at the start of processing the stream, and we only reallocate if we encounter a command that needs a larger buffer. Signed-off-by: Boris Burkov <boris@bur.io>
The new format privileges the BTRFS_SEND_A_DATA attribute by guaranteeing it will always be the last attribute in any command that needs it, and by implicitly encoding the data length as the difference between the total command length in the command header and the sizes of the rest of the attributes (and of course the tlv_type identifying the DATA attribute). To parse the new stream, we must read the tlv_type and if it is not DATA, we proceed normally, but if it is DATA, we don't parse a tlv_len but simply compute the length. In addition, we add some bounds checking when parsing each chunk of data, as well as for the tlv_len itself. Signed-off-by: Boris Burkov <boris@bur.io>
Send stream v2 adds three commands and several attributes associated to those commands. Before we implement processing them, add all the commands and attributes. This avoids leaving the enums in an intermediate state that doesn't correspond to any version of send stream. Signed-off-by: Boris Burkov <boris@bur.io>
Encoded writes in receive will use pwritev2. It is possible that the system libc does not export this function, so we stub it out and detect whether to build the stub code with autoconf. This syscall has special semantics in x32 (no hi lo, just takes loff_t) so we have to detect that case and use the appropriate arguments. Signed-off-by: Boris Burkov <boris@bur.io>
Add a new btrfs_send_op and support for both dumping and proper receive processing which does actual encoded writes. Encoded writes are only allowed on a file descriptor opened with an extra flag that allows encoded writes, so we also add support for this flag when opening or reusing a file for writing. Signed-off-by: Boris Burkov <boris@bur.io>
…rite An encoded_write can fail if the file system it is being applied to does not support encoded writes or if it can't find enough contiguous space to accommodate the encoded extent. In those cases, we can likely still process an encoded_write by explicitly decoding the data and doing a normal write. Add the necessary fallback path for decoding data compressed with zlib, lzo, or zstd. zlib and zstd have reusable decoding context data structures which we cache in the receive context so that we don't have to recreate them on every encoded_write. Finally, add a command line flag for force-decompress which causes receive to always use the fallback path rather than first attempting the encoded write. Signed-off-by: Boris Burkov <boris@bur.io>
Send stream v2 can emit fallocate commands, so receive must support them as well. The implementation simply passes along the arguments to the syscall. Note that mode is encoded as a u32 in send stream but fallocate takes an int, so there is a unsigned->signed conversion there. Signed-off-by: Boris Burkov <boris@bur.io>
In send stream v2, send can emit a command for setting inode flags via the setflags ioctl. Pass the flags attribute through to the ioctl call in receive. Signed-off-by: Boris Burkov <boris@bur.io>
To make the btrfs send ioctl use the stream v2 format requires passing BTRFS_SEND_FLAG_STREAM_V2 in flags. Further, to cause the ioctl to emit encoded_write commands for encoded extents, we must set that flag as well as BTRFS_SEND_FLAG_COMPRESSED. Finally, we bump up the version in send.h as well, since we are now fully compatible with v2. Add two command line arguments to btrfs send: --stream-version and --compressed. --stream-version requires an argument which it parses as an integer and sets STREAM_V2 if the argument is 2. --compressed does not require an argument and automatically implies STREAM_V2 as well (COMPRESSED alone causes the ioctl to error out). Some examples to illustrate edge cases: // v1, old format and no encoded_writes btrfs send subvol btrfs send --stream-version 1 subvol // v2 and compressed, we will see encoded_writes btrfs send --compressed subvol btrfs send --compressed --stream-version 2 subvol // v2 only, new format but no encoded_writes btrfs send --stream-version 2 subvol // error: compressed needs version >= 2 btrfs send --compressed --stream-version 1 subvol // error: invalid version (not 1 or 2) btrfs send --stream-version 3 subvol btrfs send --compressed --stream-version 0 subvol btrfs send --compressed --stream-version 10 subvol Signed-off-by: Boris Burkov <boris@bur.io>
Adapt the existing send/receive tests by passing '-o --force-compress' to the mount commands in a new test. After writing a few files in the various compression formats, send/receive them with and without --force-decompress to test both the encoded_write path and the fallback to decode+write. Signed-off-by: Boris Burkov <boris@bur.io>
…properly handled [BUG] When a special image (diverted from fsck/012) has its unused slots (slot number >= nritems) with garbage, lowmem mode btrfs check can crash: (gdb) run check --mode=lowmem ~/downloads/good.img.restored Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored ... ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0) ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0 Program received signal SIGSEGV, Segmentation fault. 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703 1703 BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64); (gdb) bt #0 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703 #1 0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628 [CAUSE] At check_inode_item() we have path->slot[0] at 29, while the tree block only has 26 items. This happens because two reasons: - btrfs_next_item() never reverts its slots Even if we failed to read next leaf. - check_inode_item() doesn't inform the caller that a fatal error happened In check_inode_item(), if btrfs_next_item() failed, it goes to out label, which doesn't really set @err properly. This means, when check_inode_item() fails at btrfs_next_item(), it will increase path->slots[0], while it's already beyond current tree block nritems. When the slot increases furthermore, and if the unused item slots have some garbage, we will get invalid btrfs_item_ptr() result, and causing above segfault. [FIX] Fix the problems by two ways: - Make btrfs_next_item() to revert its path->slots[0] on failure - Properly detect fatal error from check_inode_item() By this, we will no longer crash on the crafted image. Reported-by: Wang Yugui <wangyugui@e16-tech.com> Issue: kdave#412 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…level [BUG] When running lowmem mode with METADATA_ITEM which has invalid level, it will crash with the following backtrace: (gdb) bt #0 0x0000555555616b0b in btrfs_header_bytenr (eb=0x4) at ./kernel-shared/ctree.h:2134 #1 0x0000555555620c78 in check_tree_block_backref (root_id=5, bytenr=30457856, level=256) at check/mode-lowmem.c:3818 kdave#2 0x0000555555621f6c in check_extent_item (path=0x7fffffffd9c0) at check/mode-lowmem.c:4334 kdave#3 0x00005555556235a5 in check_leaf_items (root=0x555555688e10, path=0x7fffffffd9c0, nrefs=0x7fffffffda30, account_bytes=1) at check/mode-lowmem.c:4835 kdave#4 0x0000555555623c6d in walk_down_tree (root=0x555555688e10, path=0x7fffffffd9c0, level=0x7fffffffd984, nrefs=0x7fffffffda30, check_all=1) at check/mode-lowmem.c:4967 kdave#5 0x000055555562494f in check_btrfs_root (root=0x555555688e10, check_all=1) at check/mode-lowmem.c:5266 kdave#6 0x00005555556254ee in check_chunks_and_extents_lowmem () at check/mode-lowmem.c:5556 kdave#7 0x00005555555f0b82 in do_check_chunks_and_extents () at check/main.c:9114 kdave#8 0x00005555555f50ea in cmd_check (cmd=0x55555567c640 <cmd_struct_check>, argc=3, argv=0x7fffffffdec0) at check/main.c:10892 kdave#9 0x000055555556b2b1 in cmd_execute (argv=0x7fffffffdec0, argc=3, cmd=0x55555567c640 <cmd_struct_check>) at cmds/commands.h:125 [CAUSE] For function check_extent_item() it will go through inline extent items and then check their backrefs. But for METADATA_ITEM, it doesn't really validate key.offset, which is u64 and can contain value way larger than BTRFS_MAX_LEVEL (mostly caused by bit flip). In that case, if we have a larger value like 256 in key.offset, then later check_tree_block_backref() will use 256 as level, and overflow path->nodes[level] and crash. [FIX] Just verify the level, no matter if it's from btrfs_tree_block_level() (which is just u8), or it's from key.offset (which is u64). To do the check properly and detect higher bits corruption, also change the type of @Level from u8 to u64. Now lowmem mode can detect the problem properly: ... [2/7] checking extents ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7] ERROR: extent[30457856 16384] level mismatch, wanted: 0, have: 256 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree ... Reviewed-by: Su Yue <l@damenly.su> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
No description provided.