Test for-next (regular) #1268

kdave · 2024-05-16T18:40:08Z

No description provided.

@page

Instead of the old @page and @page_offset pair inside scrub, here we can directly use the virtual address for a sector. This has the following benefit: - Simplified parameters A single @kaddr will repair @page and @page_offset. - No more unnecessary kmap/kunmap calls Since all pages utilized by scrub is allocated by scrub, and no highmem is allowed, we do not need to do any kmap/kunmap. And add an ASSERT() inside the new scrub_stripe_get_kaddr() to catch any unexpected highmem page. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

This removes the last direct poke into bvec internals in btrfs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] There is a bug report that a syzbot reproducer can lead to the following busy inode at unmount time: BTRFS info (device loop1): last unmount of filesystem 1680000e-3c1e-4c46-84b6-56bd3909af50 VFS: Busy inodes after unmount of loop1 (btrfs) ------------[ cut here ]------------ kernel BUG at fs/super.c:650! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 48168 Comm: syz-executor Not tainted 6.15.0-rc2-00471-g119009db2674 #2 PREEMPT(full) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:generic_shutdown_super+0x2e9/0x390 fs/super.c:650 Call Trace: <TASK> kill_anon_super+0x3a/0x60 fs/super.c:1237 btrfs_kill_super+0x3b/0x50 fs/btrfs/super.c:2099 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473 deactivate_super fs/super.c:506 [inline] deactivate_super+0xe2/0x100 fs/super.c:502 cleanup_mnt+0x21f/0x440 fs/namespace.c:1435 task_work_run+0x14d/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x269/0x290 kernel/entry/common.c:218 do_syscall_64+0xd4/0x250 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> [CAUSE] When btrfs_alloc_path() failed, btrfs_iget() directly returned without releasing the inode already allocated by btrfs_iget_locked(). This results the above busy inode and trigger the kernel BUG. [FIX] Fix it by calling iget_failed() if btrfs_alloc_path() failed. If we hit error inside btrfs_read_locked_inode(), it will properly call iget_failed(), so nothing to worry about. Although the iget_failed() cleanup inside btrfs_read_locked_inode() is a break of the normal error handling scheme, let's fix the obvious bug and backport first, then rework the error handling later. Reported-by: Penglei Jiang <superman.xpt@gmail.com> Link: https://lore.kernel.org/linux-btrfs/20250421102425.44431-1-superman.xpt@gmail.com/ Fixes: 7c855e1 ("btrfs: remove conditional path allocation in btrfs_read_locked_inode()") CC: stable@vger.kernel.org # 6.13+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Penglei Jiang <superman.xpt@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>

Commit b28b1f0 ("btrfs: delayed-ref: Introduce better documented delayed ref structures") introduced BTRFS_REF_LAST, which can be used for sanity checking, e.g. in switch/case or for loops. In btrfs_ref_type() there is an assertion ASSERT(ref->type == BTRFS_REF_DATA || ref->type == BTRFS_REF_METADATA); to validate the values so we don't need the ending enum. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Currently ASSERT() prints the stringified condition and without macro expansions so simple constants like BTRFS_MAX_METADATA_BLOCKSIZE remain readable in the output. There are expressions where we'd like to see the exact values but all we get is something like: assertion failed: em->start <= start && start < extent_map_end(em), in fs/btrfs/extent_map.c:613 It would be nice to be able to print any additional information to help understand the problem. With some preprocessor magic and compile-time optimizations we can enhance ASSERT to work like that as well: ASSERT(value > limit, "value=%llu limit=%llu", value, limit); with free-form printk arguments that will be part of the assertion message. Pros: - helps debugging and understanding reported problems - the optional format is verified at compile-time Cons: - increases the .ko size - writing the assertion code is repetitive (condition, format, values) - format and variable type must match (extra lookup) - needs gcc 8.x and newer, otherwise it's the short format Recommended use is for non-trivial expressions, so basic ASSERT(value) can be used for pointers or sometimes integers. The format has been slightly updated to also print the result of the evaluation of the condition, appended to the stringified condition as "condition :: <value>". Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

The file volumes.c has about 40 assertions and half of them are suitable for ASSERT() with additional data. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

Add conditional WARN() wrapper that's enabled only in debug build. It should be used for unexpected conditions that should be noisy. Use it instead of ASSERT(0). As it will not lead to BUG() make sure that continuing is still possible, e.g. the error is handled anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

Use the conditional warning instead of typing the whole condition. Optional message is printed where it seems clear what could be the problem. Conversion is left out in btree_csum_one_bio() because of the additional condition. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

The use of ASSERT(0) is maybe useful for some cases but more like a notice for developers. Assertions can be compiled in independently so convert it to a debugging helper. The difference is that it's just a warning and will not end up in BUG(). The converted cases are in connection with proper error handling. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

When running machines with 64k page size and a 16k nodesize we started seeing tree log corruption in production. This turned out to be because we were not writing out dirty blocks sometimes, so this in fact affects all metadata writes. When writing out a subpage EB we scan the subpage bitmap for a dirty range. If the range isn't dirty we do bit_start++; to move onto the next bit. The problem is the bitmap is based on the number of sectors that an EB has. So in this case, we have a 64k pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4 bits for every node. With a 64k page size we end up with 4 nodes per page. To make this easier this is how everything looks [0 16k 32k 48k ] logical address [0 4 8 12 ] radix tree offset [ 64k page ] folio [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers [ | | | | | | | | | | | | | | | | ] bitmap Now we use all of our addressing based on fs_info->sectorsize_bits, so as you can see the above our 16k eb->start turns into radix entry 4. When we find a dirty range for our eb, we correctly do bit_start += sectors_per_node, because if we start at bit 0, the next bit for the next eb is 4, to correspond to eb->start 16k. However if our range is clean, we will do bit_start++, which will now put us offset from our radix tree entries. In our case, assume that the first time we check the bitmap the block is not dirty, we increment bit_start so now it == 1, and then we loop around and check again. This time it is dirty, and we go to find that start using the following equation start = folio_start + bit_start * fs_info->sectorsize; so in the case above, eb->start 0 is now dirty, and we calculate start as 0 + 1 * fs_info->sectorsize = 4096 4096 >> 12 = 1 Now we're looking up the radix tree for 1, and we won't find an eb. What's worse is now we're using bit_start == 1, so we do bit_start += sectors_per_node, which is now 5. If that eb is dirty we will run into the same thing, we will look at an offset that is not populated in the radix tree, and now we're skipping the writeout of dirty extent buffers. The best fix for this is to not use sectorsize_bits to address nodes, but that's a larger change. Since this is a fs corruption problem fix it simply by always using sectors_per_node to increment the start bit. Fixes: c4aec29 ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

Using the helper makes it a bit more clear that we're accessing the first list entry. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

First added (but not effectively used) in 02c372e ("btrfs: add support for inserting raid stripe extents"). The structure is initialized to zeros so the only use in btrfs_insert_one_raid_extent() u64 length = bioc->stripes[i].length; struct btrfs_raid_stride *raid_stride = &stripe_extent->strides[i]; if (length == 0) length = bioc->size; the 'if' always happens. Last use in 4016358 ("btrfs: remove unused variable length in btrfs_insert_one_raid_extent()") was an obvious cleanup. It seems to be safe to remove, raid-stripe-tree works without using it since 6.6. This was found by tool https://github.com/jirislaby/clang-struct . Signed-off-by: David Sterba <dsterba@suse.com>

The unsigned type is a recommended practice (CWE-190, CWE-194) for bit shifts to avoid problems with potential unwanted sign extensions. Although there are no such cases in btrfs codebase, follow the recommendation. Signed-off-by: David Sterba <dsterba@suse.com>

There's only one caller of __setup_root() so merge it there. Signed-off-by: David Sterba <dsterba@suse.com>

The bio status is read only once, no variable needed for that. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_submit_chunk(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_bio_csum(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for proper return type. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We can now rename 'error' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We don't need to have a separate variable to read the bio status, 'ret' works for that just fine so remove 'error'. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We can now rename 'ret2' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_submit_chunk(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Trivial renames to unify the naming of blk_status_t variables/parameters. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…ilemap() This is just a trivial change. The code looks a bit more readable this way, IMO. Move initialization of existing_folio to the beginning of the retry loop so it's set to NULL at one place. Signed-off-by: Daniel Vacek <neelx@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

If btrfs_clone_extent_buffer() hits an error halfway through attaching the folios, it will not call folio_put() on its folios. Unify its error handling behavior with alloc_dummy_extent_buffer() under a new function 'cleanup_extent_buffer_folios()' Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io>

When btrfs subpage support (fs block < page size) was introduced, a subpage filesystem will only reject tree blocks which cross page boundaries. This used to be a compromise to simplify the tree block handling and still allowing subpage cases to read some old converted filesystems which did not have proper chunk alignment. But in practice, suppose we have the following unaligned tree block on a 64K page sized system: 0 32K 44K 60K 64K | |///////////////| | Although btrfs has no problem reading the tree block at [44K, 60K), if extent allocator is allocating another tree block, it may choose the range [60K, 74K), as extent allocator has no awareness if it's a subpage metadata request or not. Then we'd get -EINVAL from the following sequence: btrfs_alloc_tree_block() |- btrfs_reserve_extent() | Which returned range [60K, 74K) |- btrfs_init_new_buffer() |- btrfs_find_create_tree_block() |- alloc_extent_buffer() |- check_eb_alignment() Which returned -EINVAL, because the range crosses page boundary. This situation will not fix itself and should mostly mark the fs read-only. Thankfully we didn't really get such reports in the real world because: - The original unaligned tree block is only caused by older btrfs-convert It's before the btrfs-convert rework was done in v4.6, where converted btrfs filesystem can have metadata block groups which are not aligned to nodesize nor stripe size (64K). But after btrfs-progs v4.6, all chunks allocated will be stripe (64K) aligned, thus no more such problem. Considering how old the fix is (v4.6 was released almost 10 years ago), subpage support for btrfs was introduced in v5.15, it should be safe to reject those unaligned tree blocks. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

kdave force-pushed the test-for-next branch from dd76c02 to 9e82d3e Compare May 16, 2024 20:14

kdave force-pushed the ci branch from 59029fa to 09751fc Compare May 21, 2024 16:53

kdave force-pushed the test-for-next branch 3 times, most recently from 1e76edd to 6d347e7 Compare May 24, 2024 21:18

kdave changed the title ~~Test for-next rc0~~ Test for-next rc1 May 28, 2024

kdave force-pushed the test-for-next branch from e596ef9 to b285eb5 Compare June 3, 2024 20:26

kdave changed the title ~~Test for-next rc1~~ Test for-next (6.10-rc2) Jun 3, 2024

kdave force-pushed the test-for-next branch from fbdf30a to 1d8400b Compare June 12, 2024 18:52

kdave changed the title ~~Test for-next (6.10-rc2)~~ Test for-next (6.10-rc3) Jun 13, 2024

kdave force-pushed the test-for-next branch 2 times, most recently from 68f1b4b to 4a4d10c Compare June 24, 2024 15:33

kdave changed the title ~~Test for-next (6.10-rc3)~~ Test for-next (6.10-rc5) Jun 24, 2024

kdave force-pushed the test-for-next branch 4 times, most recently from a817d1a to dde24c7 Compare July 2, 2024 14:58

kdave force-pushed the test-for-next branch from f6dc71f to 5099ede Compare July 11, 2024 13:42

kdave force-pushed the test-for-next branch from 5099ede to 8296091 Compare July 18, 2024 15:40

kdave changed the title ~~Test for-next (6.10-rc5)~~ Test for-next (regular) Jul 18, 2024

kdave force-pushed the test-for-next branch 3 times, most recently from d6b8894 to fc58d4b Compare August 5, 2024 16:49

kdave force-pushed the test-for-next branch 3 times, most recently from 3c3dc5e to 7b9aa31 Compare August 15, 2024 18:34

kdave force-pushed the test-for-next branch 2 times, most recently from 7f78dd5 to 00782aa Compare August 29, 2024 16:46

kdave force-pushed the test-for-next branch 2 times, most recently from 5278b7a to b7b74b3 Compare September 4, 2024 21:31

Christoph Hellwig and others added 29 commits April 24, 2025 11:35

btrfs: use verbose ASSERT() in volumes.c

2a7dfa7

The file volumes.c has about 40 assertions and half of them are suitable for ASSERT() with additional data. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use list_first_entry() everywhere

3f8a8e0

Using the helper makes it a bit more clear that we're accessing the first list entry. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: merge __setup_root() to btrfs_alloc_root()

a6bb148

There's only one caller of __setup_root() so merge it there. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: drop redundant local variable in raid_wait_write_end_io()

d045358

The bio status is read only once, no variable needed for that. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: rename ret to status in btrfs_submit_chunk()

3ae5373

We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for proper return type. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: rename error to ret in btrfs_submit_chunk()

a7d626f

We can now rename 'error' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: simplify reading bio status in end_compressed_writeback()

909a19c

We don't need to have a separate variable to read the bio status, 'ret' works for that just fine so remove 'error'. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: rename ret to status in btrfs_submit_compressed_read()

04a13d6

We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: rename ret2 to ret in btrfs_submit_compressed_read()

d8c87bd

We can now rename 'ret2' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: raid56: rename parameter err to status in endio helpers

c1e7e2e

Trivial renames to unify the naming of blk_status_t variables/parameters. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

kdave force-pushed the test-for-next branch from 7c53af8 to 7bf3a99 Compare April 25, 2025 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test for-next (regular) #1268

Test for-next (regular) #1268

Uh oh!

kdave commented May 16, 2024

Uh oh!

Uh oh!

Test for-next (regular) #1268

Are you sure you want to change the base?

Test for-next (regular) #1268

Uh oh!

Conversation

kdave commented May 16, 2024

Uh oh!

Uh oh!