-
Notifications
You must be signed in to change notification settings - Fork 86
bcachefs: fix initial page state after falloc #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: testing
Are you sure you want to change the base?
Conversation
An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
this is used in only one place now, so just inline it into the caller. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Change fsck code to always put btree iterators - also, make some flow control improvements to deal with lock restarts better, and refactor check_extents() to not walk extents twice for counting/checking i_sectors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
comparison was wrong Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Prep work for snapshots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
…dvance The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree node iterators need to obey the regular btree node invarionts w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have less that it needs to check. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
More prep work for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
It had some silly redundancies. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
External (to the btree iterator code) users of bch2_btree_iter_traverse expect that on success the iterator will be pointed at iter->pos and have that position locked - but since we split iter->pos and iter->real_pos, that means it has to update iter->real_pos if necessary. Internal users don't expect it to modify iter->real_pos, so we need two separate functions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Small improvements to some percpu utility code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Ensure that iter->should_be_locked value is set to true before we call bch2_trans_update in ec_stripe_update_ptrs. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
1778607
to
6ac3b1f
Compare
6ac3b1f
to
b755ab3
Compare
Updated and seems to pass xfstests. |
It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We weren't checking bch2_btree_node_read_done() for errors, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We need to ensure that packed formats can't represent fields larger than the unpacked format, which is a bit tricky since the calculations can also overflow a u64. This patch fixes a shift and simplifies the overall calculations. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The value of f_bfree and f_bavail should be the same. The value of f_bfree is not currently scaled by the availability factor. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Avoid calling kfree on the returned error pointer if bch2_acl_from_disk fails. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
When initializing the page state, set the page sectors state to that of what exists in the btree. This allows us to check if the backing extent was already allocated before adding to the inode i_blocks value. Signed-off-by: Dan Robertson <dan@dlrobertson.com>
b755ab3
to
3fe6e97
Compare
Updated to only run the btree check if the page is not uptodate. I kept the check in the page state creation to ensure this check isn't run often. |
I think i have an idea for a better implementation of this. We don't seem to call write_begin for a buffered write. I think if we add that, i might be able to work something out there so that we essentially run the extra code added to |
…frontend" As reported by Thomas Voegtle <tv@lio96.de>, sometimes a DVB card does not initialize properly booting Linux 6.4-rc4. This is not always, maybe in 3 out of 4 attempts. After double-checking, the root cause seems to be related to the UAF fix, which is causing a race issue: [ 26.332149] tda10071 7-0005: found a 'NXP TDA10071' in cold state, will try to load a firmware [ 26.340779] tda10071 7-0005: downloading firmware from file 'dvb-fe-tda10071.fw' [ 989.277402] INFO: task vdr:743 blocked for more than 491 seconds. [ 989.283504] Not tainted 6.4.0-rc5-i5 #249 [ 989.288036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 989.295860] task:vdr state:D stack:0 pid:743 ppid:711 flags:0x00004002 [ 989.295865] Call Trace: [ 989.295867] <TASK> [ 989.295869] __schedule+0x2ea/0x12d0 [ 989.295877] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 989.295881] schedule+0x57/0xc0 [ 989.295884] schedule_preempt_disabled+0xc/0x20 [ 989.295887] __mutex_lock.isra.16+0x237/0x480 [ 989.295891] ? dvb_get_property.isra.10+0x1bc/0xa50 [ 989.295898] ? dvb_frontend_stop+0x36/0x180 [ 989.338777] dvb_frontend_stop+0x36/0x180 [ 989.338781] dvb_frontend_open+0x2f1/0x470 [ 989.338784] dvb_device_open+0x81/0xf0 [ 989.338804] ? exact_lock+0x20/0x20 [ 989.338808] chrdev_open+0x7f/0x1c0 [ 989.338811] ? generic_permission+0x1a2/0x230 [ 989.338813] ? link_path_walk.part.63+0x340/0x380 [ 989.338815] ? exact_lock+0x20/0x20 [ 989.338817] do_dentry_open+0x18e/0x450 [ 989.374030] path_openat+0xca5/0xe00 [ 989.374031] ? terminate_walk+0xec/0x100 [ 989.374034] ? path_lookupat+0x93/0x140 [ 989.374036] do_filp_open+0xc0/0x140 [ 989.374038] ? __call_rcu_common.constprop.91+0x92/0x240 [ 989.374041] ? __check_object_size+0x147/0x260 [ 989.374043] ? __check_object_size+0x147/0x260 [ 989.374045] ? alloc_fd+0xbb/0x180 [ 989.374048] ? do_sys_openat2+0x243/0x310 [ 989.374050] do_sys_openat2+0x243/0x310 [ 989.374052] do_sys_open+0x52/0x80 [ 989.374055] do_syscall_64+0x5b/0x80 [ 989.421335] ? __task_pid_nr_ns+0x92/0xa0 [ 989.421337] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421339] ? do_syscall_64+0x67/0x80 [ 989.421341] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421343] ? do_syscall_64+0x67/0x80 [ 989.421345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 989.421348] RIP: 0033:0x7fe895d067e3 [ 989.421349] RSP: 002b:00007fff933c2ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [ 989.421351] RAX: ffffffffffffffda RBX: 00007fff933c2c10 RCX: 00007fe895d067e3 [ 989.421352] RDX: 0000000000000802 RSI: 00005594acdce160 RDI: 00000000ffffff9c [ 989.421353] RBP: 0000000000000802 R08: 0000000000000000 R09: 0000000000000000 [ 989.421353] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 989.421354] R13: 00007fff933c2ca0 R14: 00000000ffffffff R15: 00007fff933c2c90 [ 989.421355] </TASK> This reverts commit 6769a0b. Fixes: 6769a0b ("media: dvb-core: Fix use-after-free on race condition at dvb_frontend") Link: https://lore.kernel.org/all/da5382ad-09d6-20ac-0d53-611594b30861@lio96.de/ Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after sg_miter_next() call. The driver then increments it as bytes are read, so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop(): WARN_ON(miter->consumed > miter->length); WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249 Hardware name: Generic DT based system Workqueue: events_freezable mmc_rescan Call trace:. unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x44/0x5c dump_stack_lvl from __warn+0x78/0x16c __warn from warn_slowpath_fmt+0xb0/0x160 warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c sg_miter_stop from moxart_request+0xb0/0x468 moxart_request from mmc_start_request+0x94/0xa8 mmc_start_request from mmc_wait_for_req+0x60/0xa8 mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150 mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420 mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c mmc_attach_sd from mmc_rescan+0x1e0/0x298 mmc_rescan from process_scheduled_works+0x2e4/0x4ec process_scheduled_works from worker_thread+0x1ec/0x24c worker_thread from kthread+0xd4/0xe0 kthread from ret_from_fork+0x14/0x38 This patch adds initial zeroing of sgm->consumed. It is then incremented as bytes are read or written. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Fixes: 3ee0e7c ("mmc: moxart-mmc: Use sg_miter for PIO") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240422153607.963672-1-saproj@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[BUG] There is syzbot based reproducer that can crash the kernel, with the following call trace: (With some debug output added) DEBUG: rescue=ibadroots parsed BTRFS: device fsid 14d642db-7b15-43e4-81e6-4b8fac6a25f8 devid 1 transid 8 /dev/loop0 (7:0) scanned by repro (1010) BTRFS info (device loop0): first mount of filesystem 14d642db-7b15-43e4-81e6-4b8fac6a25f8 BTRFS info (device loop0): using blake2b (blake2b-256-generic) checksum algorithm BTRFS info (device loop0): using free-space-tree BTRFS warning (device loop0): checksum verify failed on logical 5312512 mirror 1 wanted 0xb043382657aede36608fd3386d6b001692ff406164733d94e2d9a180412c6003 found 0x810ceb2bacb7f0f9eb2bf3b2b15c02af867cb35ad450898169f3b1f0bd818651 level 0 DEBUG: read tree root path failed for tree csum, ret=-5 BTRFS warning (device loop0): checksum verify failed on logical 5328896 mirror 1 wanted 0x51be4e8b303da58e6340226815b70e3a93592dac3f30dd510c7517454de8567a found 0x51be4e8b303da58e634022a315b70e3a93592dac3f30dd510c7517454de8567a level 0 BTRFS warning (device loop0): checksum verify failed on logical 5292032 mirror 1 wanted 0x1924ccd683be9efc2fa98582ef58760e3848e9043db8649ee382681e220cdee4 found 0x0cb6184f6e8799d9f8cb335dccd1d1832da1071d12290dab3b85b587ecacca6e level 0 process 'repro' launched './file2' with NULL argv: empty string added DEBUG: no csum root, idatacsums=0 ibadroots=134217728 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [koverstreet#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f] CPU: 5 UID: 0 PID: 1010 Comm: repro Tainted: G OE 6.15.0-custom+ koverstreet#249 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:btrfs_lookup_csum+0x93/0x3d0 [btrfs] Call Trace: <TASK> btrfs_lookup_bio_sums+0x47a/0xdf0 [btrfs] btrfs_submit_bbio+0x43e/0x1a80 [btrfs] submit_one_bio+0xde/0x160 [btrfs] btrfs_readahead+0x498/0x6a0 [btrfs] read_pages+0x1c3/0xb20 page_cache_ra_order+0x4b5/0xc20 filemap_get_pages+0x2d3/0x19e0 filemap_read+0x314/0xde0 __kernel_read+0x35b/0x900 bprm_execve+0x62e/0x1140 do_execveat_common.isra.0+0x3fc/0x520 __x64_sys_execveat+0xdc/0x130 do_syscall_64+0x54/0x1d0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ---[ end trace 0000000000000000 ]--- [CAUSE] Firstly the fs has a corrupted csum tree root, thus to mount the fs we have to go "ro,rescue=ibadroots" mount option. Normally with that mount option, a bad csum tree root should set BTRFS_FS_STATE_NO_DATA_CSUMS flag, so that any future data read will ignore csum search. But in this particular case, we have the following call trace that caused NULL csum root, but not setting BTRFS_FS_STATE_NO_DATA_CSUMS: load_global_roots_objectid(): ret = btrfs_search_slot(); /* Succeeded */ btrfs_item_key_to_cpu() found = true; /* We found the root item for csum tree. */ root = read_tree_root_path(); if (IS_ERR(root)) { if (!btrfs_test_opt(fs_info, IGNOREBADROOTS)) /* * Since we have rescue=ibadroots mount option, * @ret is still 0. */ break; if (!found || ret) { /* @found is true, @ret is 0, error handling for csum * tree is skipped. */ } This means we completely skipped to set BTRFS_FS_STATE_NO_DATA_CSUMS if the csum tree is corrupted, which results unexpected later csum lookup. [FIX] If read_tree_root_path() failed, always populate @ret to the error number. As at the end of the function, we need @ret to determine if we need to do the extra error handling for csum tree. Fixes: abed4aa ("btrfs: track the csum, extent, and free space trees in a rb tree") Reported-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Reported-by: Longxing Li <coregee2000@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
On a new page reservation check if the backing extent was already
allocated before adding to the number of sectors we will allocate.
It is possible to write to a page that is beyond the inode i_size
that is also backed by an allocated extent. This can happen when
fallocate is used with FALLOC_FL_KEEP_SIZE.
Signed-off-by: Dan Robertson dan@dlrobertson.com