forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Landlock flat array domains #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
micromaomao
wants to merge
15
commits into
landlock-arraydomain-base
Choose a base branch
from
landlock-arraydomain
base: landlock-arraydomain-base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1d831a4
to
8c79a5c
Compare
8c79a5c
to
e51695f
Compare
27ee217
to
a12fcb1
Compare
7ef6233
to
3856771
Compare
3a84302
to
86fdfba
Compare
b4da11c
to
65ac599
Compare
???? didn't know you could bind to zero |
65ac599
to
c7a17a0
Compare
56b691d
to
aa13c57
Compare
2b90af7
to
fe48c66
Compare
micromaomao
pushed a commit
that referenced
this pull request
Jul 8, 2025
Remove redundant netif_napi_del() call from disconnect path. A WARN may be triggered in __netif_napi_del_locked() during USB device disconnect: WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 This happens because netif_napi_del() is called in the disconnect path while NAPI is still enabled. However, it is not necessary to call netif_napi_del() explicitly, since unregister_netdev() will handle NAPI teardown automatically and safely. Removing the redundant call avoids triggering the warning. Full trace: lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV lan78xx 1-1:1.0 enu1: Link is Down lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 Modules linked in: flexcan can_dev fuse CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT Hardware name: SKOV IMX8MP CPU revC - bd500 (DT) Workqueue: usb_hub_wq hub_event pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __netif_napi_del_locked+0x2b4/0x350 lr : __netif_napi_del_locked+0x7c/0x350 sp : ffffffc085b673c0 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000 Call trace: __netif_napi_del_locked+0x2b4/0x350 (P) lan78xx_disconnect+0xf4/0x360 usb_unbind_interface+0x158/0x718 device_remove+0x100/0x150 device_release_driver_internal+0x308/0x478 device_release_driver+0x1c/0x30 bus_remove_device+0x1a8/0x368 device_del+0x2e0/0x7b0 usb_disable_device+0x244/0x540 usb_disconnect+0x220/0x758 hub_event+0x105c/0x35e0 process_one_work+0x760/0x17b0 worker_thread+0x768/0xce8 kthread+0x3bc/0x690 ret_from_fork+0x10/0x20 irq event stamp: 211604 hardirqs last enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98 hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80 softirqs last enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8 softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20 ---[ end trace 0000000000000000 ]--- lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0 Fixes: ec4c7e1 ("lan78xx: Introduce NAPI polling support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250627051346.276029-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to reduce the limit from U32_MAX as we use u32 for various landlock_domain_index or landlock_layer indices. On Mon, 2 Jun 2025 at 21:50:05 +0200, Mickaël Salaün wrote [1]: > Correct. We can either use u64 or reduce the maximum number of rules. > I think LANDLOCK_MAX_NUM_RULES set to U16_MAX would be much more than > the worse practical case. Even if one buggy policy tries to add one > rule per network port, that will be OK. We could even reasonably test > this limit. We'll need to backport this change but I'm OK with that. Note that a limit of 2^24 still leaves us with more than enough room even for u32 indices, but for future-proofing and given agreement with Mickaël, setting this to U16_MAX here. Link: https://lore.kernel.org/all/20250602.uBai6ge5maiw@digikod.net/ [1] Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements the structure proposed in in [1], using a flat array to store the rules and eventually using hashing to find rules. The array is stored in the domain struct itself to avoid extra pointer indirection and make all the rule data as cache-local as possible. The non-array part of the domain struct is also kept reasonably small. This works well for a small (10 or 20 rules) ruleset, which is the common case for Landlock users, and still has reasonable performance for large ones. This will eventually make landlock_rule/landlock_ruleset only needed for unmerged rulesets, and thus it doesn't have to store multiple layers etc. create_rule and insert_rule would also hopefully become much simpler. Different to the original proposal, the number of layers for each rule is now deducted from the layer index of the next offset. In order to simplify logic, a special "terminating index" is placed after each of the two index arrays, which will contain a layer_index = num_layers. On reflection, using the name "layer" to refer to individual struct landlock_layers is very confusing especially with names like num_layers - the next version should probably find a better name for it. Link: https://lore.kernel.org/all/20250526.quec3Dohsheu@digikod.net/ [1] Signed-off-by: Tingmao Wang <m@maowtm.org>
86fdfba
to
aeceeda
Compare
e395c59
to
dffbd3f
Compare
This commit introduces utility functions for handling a (generic) compact coalesced hash table, which we will use in the next commit. I decided to make it generic for now but we can make it landlock-specific if we want. This should include a randomized unit test - I will add this in the next version. Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements a function to search for matching rules using the newly defined coalesced hashtable, and define convinience macros for fs and net respectively, as well as a macro to iterate over the layers of the rule. Signed-off-by: Tingmao Wang <m@maowtm.org>
This algorithm is a slight alteration of the one on Wikipedia at the time of writing [1]. The difference is that when we are trying to insert into a slot that is already being used (whether by an element that doesn't actually belong there, and is just in a collision chain of a different hash, or whether it is the head of a chain and thus has the correct hash), we move the existing element away and insert the new element in its place. The result is that if there is some element in the hash table with a certain hash, the slot corresponding to that hash will always be the slot that starts the collision chain for that hash. In order words, chains won't "mix" and if we detect that the hash of the element at the slot we're targeting is not correct, we know that the hash table does not contain the hash we want. [1]: https://en.wikipedia.org/w/index.php?title=Coalesced_hashing&oldid=1214362866 This patch seems to have hit a checkpatch false positive: ERROR: need consistent spacing around '*' (ctx:WxV) torvalds#281: FILE: security/landlock/coalesced_hash.h:349: + elem_type *table, h_index_t table_size) \ ^ ERROR: need consistent spacing around '*' (ctx:WxV) torvalds#288: FILE: security/landlock/coalesced_hash.h:356: + struct h_insert_scratch *scratch, const elem_type *elem) \ ^ Since this is kinda a niche use-case, I will make a report only after this series gets out of RFC (and if they still show up). Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements a 3-stage merge, generic over the type of rules (so that it can be repeated for fs and net). It contains a small refactor to re-use the rbtree search code in ruleset.c. 3 passes are needed because, aside from calculating the size of the arrays to allocate, we also need to first populate the index table before we can write out the layers sequentially, as the index will not be written in-order. Doing it this way means that one rule's layers ends where the next rule's layers start. Signed-off-by: Tingmao Wang <m@maowtm.org>
We will eventually need a deferred free just like we currently have for the ruleset, so we define it here as well. To minimize the size of the domain struct before the rules array, we separately allocate the work_struct (which is currently 72 bytes) and just keep a pointer in the domain. This patch triggers another (false positive?) checkpatch warning: ERROR: trailing statements should be on next line torvalds#177: FILE: security/landlock/domain.h:197: DEFINE_FREE(landlock_put_domain, struct landlock_domain *, + if (!IS_ERR_OR_NULL(_T)) landlock_put_domain(_T)) Signed-off-by: Tingmao Wang <m@maowtm.org>
Implement the equivalent of landlock_merge_ruleset, but using the new domain structure. The logic in inherit_domain and landlock_domain_merge_ruleset c.f. inherit_ruleset and merge_ruleset. Once we replace the existing landlock_restrict_self code to use this those two functions can then be removed. Signed-off-by: Tingmao Wang <m@maowtm.org>
- Replace domain in landlock_cred with landlock_domain. - Replace landlock_merge_ruleset with landlock_domain_merge_ruleset. - Pull landlock_put_hierarchy out of domain.h. This allows domain.h to not depend on audit.h, as audit.h -> cred.h will need to depend on domain.h instead of ruleset.h after changing it to use the new domain struct. - Update uses of landlock_ruleset-domains to landlock_domain checkpath seems to not like the `layer_mask_t (*const layer_masks)[]` argument: WARNING: function definition argument 'layer_mask_t' should also have an identifier name torvalds#171: FILE: security/landlock/domain.h:397: +bool landlock_unmask_layers(const struct landlock_found_rule rule, WARNING: function definition argument 'layer_mask_t' should also have an identifier name torvalds#176: FILE: security/landlock/domain.h:402: +access_mask_t Signed-off-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Tingmao Wang <m@maowtm.org>
[1] introduces a check which doesn't seem fully correct / necessary, and breaks the build for a further commit in this series. This patch replaces it with just a signedness check. Cc: Tahera Fahimi <fahimitahera@gmail.com> Link: https://lore.kernel.org/all/5f7ad85243b78427242275b93481cfc7c127764b.1725494372.git.fahimitahera@gmail.com/ [1] Signed-off-by: Tingmao Wang <m@maowtm.org>
The current hash uses a division to compute the mod, which can be slow and also unnecessarily loses entropy (since ideally we want to use the most significant bits of the hash): ./include/linux/hash.h: 78 return val * GOLDEN_RATIO_64 >> (64 - bits); 0x0000000000000956 <+118>: 49 0f af c3 imul %r11,%rax security/landlock/domain.h: 178 DEFINE_COALESCED_HASH_TABLE(struct landlock_domain_index, dom_hash, key, 0x000000000000095a <+122>: 48 c1 e8 20 shr $0x20,%rax 0x000000000000095e <+126>: f7 f6 div %esi 0x0000000000000960 <+128>: 89 d0 mov %edx,%eax 0x0000000000000962 <+130>: 49 89 c5 mov %rax,%r13 This commits introduces a hash_bits parameter to the hash table, and use a folding hash instead of mod to constrain the value to table_size. Benchmark comparison: > ./parse-bpftrace.py typical-workload-{orig,arraydomain-{hashtable-modhash,hashtable-hashbits}}.log landlock_overhead: avg = 34 34 34 median = 35 34 34 landlock_hook: avg = 878 875 856 median = 854 850 831 open_syscall: avg = 2517 2532 2485 median = 2457 2471 2425 Signed-off-by: Tingmao Wang <m@maowtm.org>
dffbd3f
to
33934ca
Compare
Need to also make empty slots works properly. Signed-off-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Tingmao Wang <m@maowtm.org>
33934ca
to
edb052b
Compare
micromaomao
pushed a commit
that referenced
this pull request
Sep 14, 2025
BUG: kernel NULL pointer dereference, address: 00000000000002ec PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 28 UID: 0 PID: 343 Comm: kworker/28:1 Kdump: loaded Tainted: G OE 6.17.0-rc2+ #9 NONE Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_ib_is_sg_need_sync+0x9e/0xd0 [smc] ... Call Trace: <TASK> smcr_buf_map_link+0x211/0x2a0 [smc] __smc_buf_create+0x522/0x970 [smc] smc_buf_create+0x3a/0x110 [smc] smc_find_rdma_v2_device_serv+0x18f/0x240 [smc] ? smc_vlan_by_tcpsk+0x7e/0xe0 [smc] smc_listen_find_device+0x1dd/0x2b0 [smc] smc_listen_work+0x30f/0x580 [smc] process_one_work+0x18c/0x340 worker_thread+0x242/0x360 kthread+0xe7/0x220 ret_from_fork+0x13a/0x160 ret_from_fork_asm+0x1a/0x30 </TASK> If the software RoCE device is used, ibdev->dma_device is a null pointer. As a result, the problem occurs. Null pointer detection is added to prevent problems. Fixes: 0ef69e7 ("net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20250828124117.2622624-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
micromaomao
pushed a commit
that referenced
this pull request
Sep 21, 2025
Steven Rostedt reported a crash with "ftrace=function" kernel command line: [ 0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c [ 0.160254] #PF: supervisor read access in kernel mode [ 0.160975] #PF: error_code(0x0000) - not-present page [ 0.161697] PGD 0 P4D 0 [ 0.162055] Oops: Oops: 0000 [#1] SMP PTI [ 0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef) [ 0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237) [ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0 [ 0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086 [ 0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000 [ 0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000 [ 0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90 [ 0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000 [ 0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001 [ 0.174542] FS: 0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000 [ 0.175684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0 [ 0.177483] Call Trace: [ 0.177828] <TASK> [ 0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2)) [ 0.178692] mas_store_gfp (lib/maple_tree.c:5468) [ 0.179223] execmem_cache_add_locked (mm/execmem.c:207) [ 0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475) [ 0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169) [ 0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158) [ 0.181517] execmem_alloc_rw (mm/execmem.c:487) [ 0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474) [ 0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182) [ 0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947) [ 0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368) [ 0.184682] ftrace_startup (kernel/trace/ftrace.c:3048) [ 0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) [ 0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717) [ 0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745) [ 0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210) [ 0.187924] function_trace_init (kernel/trace/trace_functions.c:170) [ 0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349) [ 0.189088] register_tracer (kernel/trace/trace.c:2391) [ 0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149) [ 0.190204] start_kernel (init/main.c:970) [ 0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307) [ 0.191381] x86_64_start_kernel (??:?) [ 0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419) [ 0.192534] </TASK> [ 0.192839] Modules linked in: [ 0.193267] CR2: 000000000000001c [ 0.193730] ---[ end trace 0000000000000000 ]--- The crash happens because on x86 ftrace allocations from execmem require maple tree to be initialized. Move maple tree initialization that depends only on slab availability earlier in boot so that it will happen right after mm_core_init(). Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org Fixes: 5d79c2b ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Closes: https://lore.kernel.org/all/20250820184743.0302a8b5@gandalf.local.home/ Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.