Skip to content

Conversation

micromaomao
Copy link
Owner

No description provided.

@micromaomao micromaomao force-pushed the landlock-arraydomain branch 5 times, most recently from 1d831a4 to 8c79a5c Compare June 29, 2025 17:48
@micromaomao micromaomao requested a review from Copilot June 29, 2025 18:08
Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the landlock-arraydomain branch from 8c79a5c to e51695f Compare June 29, 2025 18:18
@micromaomao micromaomao requested a review from Copilot June 29, 2025 18:19
Copilot

This comment was marked as resolved.

@micromaomao micromaomao force-pushed the landlock-arraydomain branch 2 times, most recently from 27ee217 to a12fcb1 Compare June 29, 2025 21:28
@micromaomao micromaomao force-pushed the landlock-arraydomain branch from 7ef6233 to 3856771 Compare July 5, 2025 15:07
@micromaomao micromaomao force-pushed the landlock-arraydomain-base branch from 3a84302 to 86fdfba Compare July 5, 2025 15:07
@micromaomao micromaomao force-pushed the landlock-arraydomain branch 7 times, most recently from b4da11c to 65ac599 Compare July 6, 2025 01:48
@micromaomao
Copy link
Owner Author

???? didn't know you could bind to zero

@micromaomao micromaomao force-pushed the landlock-arraydomain branch from 65ac599 to c7a17a0 Compare July 6, 2025 02:10
@micromaomao micromaomao requested a review from Copilot July 6, 2025 02:13
Copilot

This comment was marked as outdated.

@micromaomao micromaomao force-pushed the landlock-arraydomain branch 2 times, most recently from 56b691d to aa13c57 Compare July 6, 2025 12:16
@micromaomao micromaomao requested a review from Copilot July 6, 2025 12:20
@micromaomao micromaomao force-pushed the landlock-arraydomain branch 8 times, most recently from 2b90af7 to fe48c66 Compare July 6, 2025 20:16
micromaomao pushed a commit that referenced this pull request Jul 8, 2025
Remove redundant netif_napi_del() call from disconnect path.

A WARN may be triggered in __netif_napi_del_locked() during USB device
disconnect:

  WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350

This happens because netif_napi_del() is called in the disconnect path while
NAPI is still enabled. However, it is not necessary to call netif_napi_del()
explicitly, since unregister_netdev() will handle NAPI teardown automatically
and safely. Removing the redundant call avoids triggering the warning.

Full trace:
 lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV
 lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV
 lan78xx 1-1:1.0 enu1: Link is Down
 lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
 Modules linked in: flexcan can_dev fuse
 CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT
 Hardware name: SKOV IMX8MP CPU revC - bd500 (DT)
 Workqueue: usb_hub_wq hub_event
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __netif_napi_del_locked+0x2b4/0x350
 lr : __netif_napi_del_locked+0x7c/0x350
 sp : ffffffc085b673c0
 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8
 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb
 x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000
 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000
 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028
 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8
 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000
 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001
 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000
 Call trace:
  __netif_napi_del_locked+0x2b4/0x350 (P)
  lan78xx_disconnect+0xf4/0x360
  usb_unbind_interface+0x158/0x718
  device_remove+0x100/0x150
  device_release_driver_internal+0x308/0x478
  device_release_driver+0x1c/0x30
  bus_remove_device+0x1a8/0x368
  device_del+0x2e0/0x7b0
  usb_disable_device+0x244/0x540
  usb_disconnect+0x220/0x758
  hub_event+0x105c/0x35e0
  process_one_work+0x760/0x17b0
  worker_thread+0x768/0xce8
  kthread+0x3bc/0x690
  ret_from_fork+0x10/0x20
 irq event stamp: 211604
 hardirqs last  enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98
 hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80
 softirqs last  enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8
 softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20
 ---[ end trace 0000000000000000 ]---
 lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0

Fixes: ec4c7e1 ("lan78xx: Introduce NAPI polling support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250627051346.276029-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to reduce the limit from U32_MAX as we use u32 for various
landlock_domain_index or landlock_layer indices.

On Mon, 2 Jun 2025 at 21:50:05 +0200, Mickaël Salaün wrote [1]:
> Correct.  We can either use u64 or reduce the maximum number of rules.
> I think LANDLOCK_MAX_NUM_RULES set to U16_MAX would be much more than
> the worse practical case.  Even if one buggy policy tries to add one
> rule per network port, that will be OK.  We could even reasonably test
> this limit.  We'll need to backport this change but I'm OK with that.

Note that a limit of 2^24 still leaves us with more than enough room even
for u32 indices, but for future-proofing and given agreement with Mickaël,
setting this to U16_MAX here.

Link: https://lore.kernel.org/all/20250602.uBai6ge5maiw@digikod.net/ [1]

Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements the structure proposed in in [1], using a flat array to
store the rules and eventually using hashing to find rules.  The array is
stored in the domain struct itself to avoid extra pointer indirection and
make all the rule data as cache-local as possible.  The non-array part of
the domain struct is also kept reasonably small.  This works well for a
small (10 or 20 rules) ruleset, which is the common case for Landlock
users, and still has reasonable performance for large ones.

This will eventually make landlock_rule/landlock_ruleset only needed for
unmerged rulesets, and thus it doesn't have to store multiple layers etc.
create_rule and insert_rule would also hopefully become much simpler.

Different to the original proposal, the number of layers for each rule is
now deducted from the layer index of the next offset.  In order to
simplify logic, a special "terminating index" is placed after each of the
two index arrays, which will contain a layer_index = num_layers.

On reflection, using the name "layer" to refer to individual struct
landlock_layers is very confusing especially with names like num_layers -
the next version should probably find a better name for it.

Link: https://lore.kernel.org/all/20250526.quec3Dohsheu@digikod.net/ [1]

Signed-off-by: Tingmao Wang <m@maowtm.org>
@micromaomao micromaomao force-pushed the landlock-arraydomain-base branch from 86fdfba to aeceeda Compare July 12, 2025 15:15
@micromaomao micromaomao force-pushed the landlock-arraydomain branch from e395c59 to dffbd3f Compare July 12, 2025 15:15
This commit introduces utility functions for handling a (generic) compact
coalesced hash table, which we will use in the next commit.

I decided to make it generic for now but we can make it landlock-specific
if we want.

This should include a randomized unit test - I will add this in the next
version.

Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements a function to search for matching rules using the newly
defined coalesced hashtable, and define convinience macros for fs and net
respectively, as well as a macro to iterate over the layers of the rule.

Signed-off-by: Tingmao Wang <m@maowtm.org>
This algorithm is a slight alteration of the one on Wikipedia at the time
of writing [1].  The difference is that when we are trying to insert into
a slot that is already being used (whether by an element that doesn't
actually belong there, and is just in a collision chain of a different
hash, or whether it is the head of a chain and thus has the correct hash),
we move the existing element away and insert the new element in its place.
The result is that if there is some element in the hash table with a
certain hash, the slot corresponding to that hash will always be the slot
that starts the collision chain for that hash.  In order words, chains
won't "mix" and if we detect that the hash of the element at the slot
we're targeting is not correct, we know that the hash table does not
contain the hash we want.

[1]: https://en.wikipedia.org/w/index.php?title=Coalesced_hashing&oldid=1214362866

This patch seems to have hit a checkpatch false positive:

	ERROR: need consistent spacing around '*' (ctx:WxV)
	torvalds#281: FILE: security/landlock/coalesced_hash.h:349:
	+               elem_type *table, h_index_t table_size)                       \
	                          ^

	ERROR: need consistent spacing around '*' (ctx:WxV)
	torvalds#288: FILE: security/landlock/coalesced_hash.h:356:
	+               struct h_insert_scratch *scratch, const elem_type *elem)      \
	                                                                  ^

Since this is kinda a niche use-case, I will make a report only after this
series gets out of RFC (and if they still show up).

Signed-off-by: Tingmao Wang <m@maowtm.org>
This implements a 3-stage merge, generic over the type of rules (so that
it can be repeated for fs and net).  It contains a small refactor to
re-use the rbtree search code in ruleset.c.

3 passes are needed because, aside from calculating the size of the arrays
to allocate, we also need to first populate the index table before we can
write out the layers sequentially, as the index will not be written
in-order.  Doing it this way means that one rule's layers ends where the
next rule's layers start.

Signed-off-by: Tingmao Wang <m@maowtm.org>
We will eventually need a deferred free just like we currently have for
the ruleset, so we define it here as well.  To minimize the size of the
domain struct before the rules array, we separately allocate the
work_struct (which is currently 72 bytes) and just keep a pointer in the
domain.

This patch triggers another (false positive?) checkpatch warning:

	ERROR: trailing statements should be on next line
	torvalds#177: FILE: security/landlock/domain.h:197:
	 DEFINE_FREE(landlock_put_domain, struct landlock_domain *,
	+	    if (!IS_ERR_OR_NULL(_T)) landlock_put_domain(_T))

Signed-off-by: Tingmao Wang <m@maowtm.org>
Implement the equivalent of landlock_merge_ruleset, but using the new
domain structure.  The logic in inherit_domain and
landlock_domain_merge_ruleset c.f. inherit_ruleset and merge_ruleset.
Once we replace the existing landlock_restrict_self code to use this those
two functions can then be removed.

Signed-off-by: Tingmao Wang <m@maowtm.org>
- Replace domain in landlock_cred with landlock_domain.
- Replace landlock_merge_ruleset with landlock_domain_merge_ruleset.
- Pull landlock_put_hierarchy out of domain.h.
  This allows domain.h to not depend on audit.h, as audit.h -> cred.h will
  need to depend on domain.h instead of ruleset.h after changing it to use
  the new domain struct.
- Update uses of landlock_ruleset-domains to landlock_domain

checkpath seems to not like the `layer_mask_t (*const layer_masks)[]` argument:

	WARNING: function definition argument 'layer_mask_t' should also have an identifier name
	torvalds#171: FILE: security/landlock/domain.h:397:
	+bool landlock_unmask_layers(const struct landlock_found_rule rule,

	WARNING: function definition argument 'layer_mask_t' should also have an identifier name
	torvalds#176: FILE: security/landlock/domain.h:402:
	+access_mask_t

Signed-off-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Tingmao Wang <m@maowtm.org>
[1] introduces a check which doesn't seem fully correct / necessary, and
breaks the build for a further commit in this series.  This patch replaces
it with just a signedness check.

Cc: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/all/5f7ad85243b78427242275b93481cfc7c127764b.1725494372.git.fahimitahera@gmail.com/ [1]
Signed-off-by: Tingmao Wang <m@maowtm.org>
The current hash uses a division to compute the mod, which can be slow and
also unnecessarily loses entropy (since ideally we want to use the most
significant bits of the hash):

./include/linux/hash.h:
78              return val * GOLDEN_RATIO_64 >> (64 - bits);
   0x0000000000000956 <+118>:   49 0f af c3             imul   %r11,%rax

security/landlock/domain.h:
178     DEFINE_COALESCED_HASH_TABLE(struct landlock_domain_index, dom_hash, key,
   0x000000000000095a <+122>:   48 c1 e8 20             shr    $0x20,%rax
   0x000000000000095e <+126>:   f7 f6                   div    %esi
   0x0000000000000960 <+128>:   89 d0                   mov    %edx,%eax
   0x0000000000000962 <+130>:   49 89 c5                mov    %rax,%r13

This commits introduces a hash_bits parameter to the hash table, and use a
folding hash instead of mod to constrain the value to table_size.

Benchmark comparison:

	> ./parse-bpftrace.py typical-workload-{orig,arraydomain-{hashtable-modhash,hashtable-hashbits}}.log
	  landlock_overhead:    avg = 34        34      34
	                     median = 35        34      34
	  landlock_hook:        avg = 878       875     856
	                     median = 854       850     831
	  open_syscall:         avg = 2517      2532    2485
	                     median = 2457      2471    2425

Signed-off-by: Tingmao Wang <m@maowtm.org>
@micromaomao micromaomao force-pushed the landlock-arraydomain branch from dffbd3f to 33934ca Compare July 12, 2025 16:22
Need to also make empty slots works properly.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Tingmao Wang <m@maowtm.org>
@micromaomao micromaomao force-pushed the landlock-arraydomain branch from 33934ca to edb052b Compare July 12, 2025 18:01
micromaomao pushed a commit that referenced this pull request Sep 14, 2025
BUG: kernel NULL pointer dereference, address: 00000000000002ec
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 28 UID: 0 PID: 343 Comm: kworker/28:1 Kdump: loaded Tainted: G        OE       6.17.0-rc2+ #9 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_ib_is_sg_need_sync+0x9e/0xd0 [smc]
...
Call Trace:
 <TASK>
 smcr_buf_map_link+0x211/0x2a0 [smc]
 __smc_buf_create+0x522/0x970 [smc]
 smc_buf_create+0x3a/0x110 [smc]
 smc_find_rdma_v2_device_serv+0x18f/0x240 [smc]
 ? smc_vlan_by_tcpsk+0x7e/0xe0 [smc]
 smc_listen_find_device+0x1dd/0x2b0 [smc]
 smc_listen_work+0x30f/0x580 [smc]
 process_one_work+0x18c/0x340
 worker_thread+0x242/0x360
 kthread+0xe7/0x220
 ret_from_fork+0x13a/0x160
 ret_from_fork_asm+0x1a/0x30
 </TASK>

If the software RoCE device is used, ibdev->dma_device is a null pointer.
As a result, the problem occurs. Null pointer detection is added to
prevent problems.

Fixes: 0ef69e7 ("net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://patch.msgid.link/20250828124117.2622624-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
micromaomao pushed a commit that referenced this pull request Sep 21, 2025
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[    0.181517] execmem_alloc_rw (mm/execmem.c:487)
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[    0.189088] register_tracer (kernel/trace/trace.c:2391)
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[    0.190204] start_kernel (init/main.c:970)
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[    0.191381] x86_64_start_kernel (??:?)
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---

The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.

Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().

Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org
Fixes: 5d79c2b ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250820184743.0302a8b5@gandalf.local.home/
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant