forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 10
Test for-next (regular) #1268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kdave
wants to merge
10,000
commits into
btrfs:ci
Choose a base branch
from
kdave:test-for-next
base: ci
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Test for-next (regular) #1268
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1e76edd
to
6d347e7
Compare
68f1b4b
to
4a4d10c
Compare
a817d1a
to
dde24c7
Compare
d6b8894
to
fc58d4b
Compare
3c3dc5e
to
7b9aa31
Compare
7f78dd5
to
00782aa
Compare
5278b7a
to
b7b74b3
Compare
Instead of the old @page and @page_offset pair inside scrub, here we can directly use the virtual address for a sector. This has the following benefit: - Simplified parameters A single @kaddr will repair @page and @page_offset. - No more unnecessary kmap/kunmap calls Since all pages utilized by scrub is allocated by scrub, and no highmem is allowed, we do not need to do any kmap/kunmap. And add an ASSERT() inside the new scrub_stripe_get_kaddr() to catch any unexpected highmem page. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This removes the last direct poke into bvec internals in btrfs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] There is a bug report that a syzbot reproducer can lead to the following busy inode at unmount time: BTRFS info (device loop1): last unmount of filesystem 1680000e-3c1e-4c46-84b6-56bd3909af50 VFS: Busy inodes after unmount of loop1 (btrfs) ------------[ cut here ]------------ kernel BUG at fs/super.c:650! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 48168 Comm: syz-executor Not tainted 6.15.0-rc2-00471-g119009db2674 #2 PREEMPT(full) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:generic_shutdown_super+0x2e9/0x390 fs/super.c:650 Call Trace: <TASK> kill_anon_super+0x3a/0x60 fs/super.c:1237 btrfs_kill_super+0x3b/0x50 fs/btrfs/super.c:2099 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473 deactivate_super fs/super.c:506 [inline] deactivate_super+0xe2/0x100 fs/super.c:502 cleanup_mnt+0x21f/0x440 fs/namespace.c:1435 task_work_run+0x14d/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x269/0x290 kernel/entry/common.c:218 do_syscall_64+0xd4/0x250 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> [CAUSE] When btrfs_alloc_path() failed, btrfs_iget() directly returned without releasing the inode already allocated by btrfs_iget_locked(). This results the above busy inode and trigger the kernel BUG. [FIX] Fix it by calling iget_failed() if btrfs_alloc_path() failed. If we hit error inside btrfs_read_locked_inode(), it will properly call iget_failed(), so nothing to worry about. Although the iget_failed() cleanup inside btrfs_read_locked_inode() is a break of the normal error handling scheme, let's fix the obvious bug and backport first, then rework the error handling later. Reported-by: Penglei Jiang <superman.xpt@gmail.com> Link: https://lore.kernel.org/linux-btrfs/20250421102425.44431-1-superman.xpt@gmail.com/ Fixes: 7c855e1 ("btrfs: remove conditional path allocation in btrfs_read_locked_inode()") CC: stable@vger.kernel.org # 6.13+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Penglei Jiang <superman.xpt@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
Commit b28b1f0 ("btrfs: delayed-ref: Introduce better documented delayed ref structures") introduced BTRFS_REF_LAST, which can be used for sanity checking, e.g. in switch/case or for loops. In btrfs_ref_type() there is an assertion ASSERT(ref->type == BTRFS_REF_DATA || ref->type == BTRFS_REF_METADATA); to validate the values so we don't need the ending enum. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently ASSERT() prints the stringified condition and without macro expansions so simple constants like BTRFS_MAX_METADATA_BLOCKSIZE remain readable in the output. There are expressions where we'd like to see the exact values but all we get is something like: assertion failed: em->start <= start && start < extent_map_end(em), in fs/btrfs/extent_map.c:613 It would be nice to be able to print any additional information to help understand the problem. With some preprocessor magic and compile-time optimizations we can enhance ASSERT to work like that as well: ASSERT(value > limit, "value=%llu limit=%llu", value, limit); with free-form printk arguments that will be part of the assertion message. Pros: - helps debugging and understanding reported problems - the optional format is verified at compile-time Cons: - increases the .ko size - writing the assertion code is repetitive (condition, format, values) - format and variable type must match (extra lookup) - needs gcc 8.x and newer, otherwise it's the short format Recommended use is for non-trivial expressions, so basic ASSERT(value) can be used for pointers or sometimes integers. The format has been slightly updated to also print the result of the evaluation of the condition, appended to the stringified condition as "condition :: <value>". Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
The file volumes.c has about 40 assertions and half of them are suitable for ASSERT() with additional data. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Add conditional WARN() wrapper that's enabled only in debug build. It should be used for unexpected conditions that should be noisy. Use it instead of ASSERT(0). As it will not lead to BUG() make sure that continuing is still possible, e.g. the error is handled anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Use the conditional warning instead of typing the whole condition. Optional message is printed where it seems clear what could be the problem. Conversion is left out in btree_csum_one_bio() because of the additional condition. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
The use of ASSERT(0) is maybe useful for some cases but more like a notice for developers. Assertions can be compiled in independently so convert it to a debugging helper. The difference is that it's just a warning and will not end up in BUG(). The converted cases are in connection with proper error handling. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
When running machines with 64k page size and a 16k nodesize we started seeing tree log corruption in production. This turned out to be because we were not writing out dirty blocks sometimes, so this in fact affects all metadata writes. When writing out a subpage EB we scan the subpage bitmap for a dirty range. If the range isn't dirty we do bit_start++; to move onto the next bit. The problem is the bitmap is based on the number of sectors that an EB has. So in this case, we have a 64k pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4 bits for every node. With a 64k page size we end up with 4 nodes per page. To make this easier this is how everything looks [0 16k 32k 48k ] logical address [0 4 8 12 ] radix tree offset [ 64k page ] folio [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers [ | | | | | | | | | | | | | | | | ] bitmap Now we use all of our addressing based on fs_info->sectorsize_bits, so as you can see the above our 16k eb->start turns into radix entry 4. When we find a dirty range for our eb, we correctly do bit_start += sectors_per_node, because if we start at bit 0, the next bit for the next eb is 4, to correspond to eb->start 16k. However if our range is clean, we will do bit_start++, which will now put us offset from our radix tree entries. In our case, assume that the first time we check the bitmap the block is not dirty, we increment bit_start so now it == 1, and then we loop around and check again. This time it is dirty, and we go to find that start using the following equation start = folio_start + bit_start * fs_info->sectorsize; so in the case above, eb->start 0 is now dirty, and we calculate start as 0 + 1 * fs_info->sectorsize = 4096 4096 >> 12 = 1 Now we're looking up the radix tree for 1, and we won't find an eb. What's worse is now we're using bit_start == 1, so we do bit_start += sectors_per_node, which is now 5. If that eb is dirty we will run into the same thing, we will look at an offset that is not populated in the radix tree, and now we're skipping the writeout of dirty extent buffers. The best fix for this is to not use sectorsize_bits to address nodes, but that's a larger change. Since this is a fs corruption problem fix it simply by always using sectors_per_node to increment the start bit. Fixes: c4aec29 ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Using the helper makes it a bit more clear that we're accessing the first list entry. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
First added (but not effectively used) in 02c372e ("btrfs: add support for inserting raid stripe extents"). The structure is initialized to zeros so the only use in btrfs_insert_one_raid_extent() u64 length = bioc->stripes[i].length; struct btrfs_raid_stride *raid_stride = &stripe_extent->strides[i]; if (length == 0) length = bioc->size; the 'if' always happens. Last use in 4016358 ("btrfs: remove unused variable length in btrfs_insert_one_raid_extent()") was an obvious cleanup. It seems to be safe to remove, raid-stripe-tree works without using it since 6.6. This was found by tool https://github.com/jirislaby/clang-struct . Signed-off-by: David Sterba <dsterba@suse.com>
The unsigned type is a recommended practice (CWE-190, CWE-194) for bit shifts to avoid problems with potential unwanted sign extensions. Although there are no such cases in btrfs codebase, follow the recommendation. Signed-off-by: David Sterba <dsterba@suse.com>
There's only one caller of __setup_root() so merge it there. Signed-off-by: David Sterba <dsterba@suse.com>
The bio status is read only once, no variable needed for that. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_submit_chunk(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_bio_csum(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_bio_csum(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for proper return type. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We can now rename 'error' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to have a separate variable to read the bio status, 'ret' works for that just fine so remove 'error'. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We're using 'status' for the blk_status_t variables, rename 'ret' so we can use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We can now rename 'ret2' to 'ret' and use it for generic errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The type blk_status_t is from block layer and not related to checksums in our context. Use int internally and do the conversions to blk_status_t as needed in btrfs_submit_chunk(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Trivial renames to unify the naming of blk_status_t variables/parameters. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…ilemap() This is just a trivial change. The code looks a bit more readable this way, IMO. Move initialization of existing_folio to the beginning of the retry loop so it's set to NULL at one place. Signed-off-by: Daniel Vacek <neelx@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
If btrfs_clone_extent_buffer() hits an error halfway through attaching the folios, it will not call folio_put() on its folios. Unify its error handling behavior with alloc_dummy_extent_buffer() under a new function 'cleanup_extent_buffer_folios()' Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io>
When btrfs subpage support (fs block < page size) was introduced, a subpage filesystem will only reject tree blocks which cross page boundaries. This used to be a compromise to simplify the tree block handling and still allowing subpage cases to read some old converted filesystems which did not have proper chunk alignment. But in practice, suppose we have the following unaligned tree block on a 64K page sized system: 0 32K 44K 60K 64K | |///////////////| | Although btrfs has no problem reading the tree block at [44K, 60K), if extent allocator is allocating another tree block, it may choose the range [60K, 74K), as extent allocator has no awareness if it's a subpage metadata request or not. Then we'd get -EINVAL from the following sequence: btrfs_alloc_tree_block() |- btrfs_reserve_extent() | Which returned range [60K, 74K) |- btrfs_init_new_buffer() |- btrfs_find_create_tree_block() |- alloc_extent_buffer() |- check_eb_alignment() Which returned -EINVAL, because the range crosses page boundary. This situation will not fix itself and should mostly mark the fs read-only. Thankfully we didn't really get such reports in the real world because: - The original unaligned tree block is only caused by older btrfs-convert It's before the btrfs-convert rework was done in v4.6, where converted btrfs filesystem can have metadata block groups which are not aligned to nodesize nor stripe size (64K). But after btrfs-progs v4.6, all chunks allocated will be stripe (64K) aligned, thus no more such problem. Considering how old the fix is (v4.6 was released almost 10 years ago), subpage support for btrfs was introduced in v5.15, it should be safe to reject those unaligned tree blocks. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.