Description
I am seeing a boot failure with ppc64_guest_defconfig
after llvm/llvm-project@7763119 (which also causes #2070) but it is not fixed with the change that resolves that issue.
$ make -skj"$(nproc)" ARCH=powerpc LLVM=1 mrproper ppc64_guest_defconfig vmlinux
$ qemu-system-ppc64 \
-display none \
-nodefaults \
-cpu power8 \
-machine pseries \
-vga none \
-kernel vmlinux \
-initrd rootfs.cpio \
-m 1G \
-serial mon:stdio
...
[ 0.000000][ T0] Linux version 6.14.0-rc3-00012-g2408a807bfc3 (nathan@ax162) (ClangBuiltLinux clang version 21.0.0git (https://github.com/llvm/llvm-project.git 7763119c6eb0976e4836f81c9876c49a36d46d73), ClangBuiltLinux LLD 21.0.0 (https://github.com/llvm/llvm-project.git 7763119c6eb0976e4836f81c9876c49a36d46d73)) #1 SMP Tue Feb 18 12:35:22 MST 2025
...
[ 0.000000][ T0] Kernel command line:
[ 0.000000][ T0] printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
[ 0.000000][ T0] Dentry cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[ 0.000000][ T0] Inode-cache hash table entries: 65536 (order: 3, 524288 bytes, linear)
[ 0.000000][ T0] Fallback order for Node 0: 0
[ 0.000000][ T0] Built 1 zonelists, mobility grouping off. Total pages: 0
[ 0.000000][ T0] Policy zone: Normal
[ 0.000000][ T0] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000][ T0] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000][ T0] ftrace: allocating 47180 entries in 12 pages
[ 0.000000][ T0] ftrace: allocated 12 pages with 2 groups
[ 0.000000][ T0] rcu: Hierarchical RCU implementation.
[ 0.000000][ T0] rcu: RCU event tracing is enabled.
[ 0.000000][ T0] rcu: RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=1.
[ 0.000000][ T0] Rude variant of Tasks RCU enabled.
[ 0.000000][ T0] Tracing variant of Tasks RCU enabled.
[ 0.000000][ T0] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000][ T0] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000][ T0] RCU Tasks Rude: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=1.
[ 0.000000][ T0] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=1.
[ 0.000000][ T0] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
At the parent change, there is no issue with booting.
[ 0.000000][ T0] Linux version 6.14.0-rc3-00012-g2408a807bfc3 (nathan@ax162) (ClangBuiltLinux clang version 21.0.0git (https://github.com/llvm/llvm-project.git f6e3d33c009cada0437c11d3fd1beace74c5dcfa), ClangBuiltLinux LLD 21.0.0 (https://github.com/llvm/llvm-project.git f6e3d33c009cada0437c11d3fd1beace74c5dcfa)) #1 SMP Tue Feb 18 12:33:45 MST 2025
...
[ 0.000000][ T0] Kernel command line:
[ 0.000000][ T0] printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
[ 0.000000][ T0] Dentry cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[ 0.000000][ T0] Inode-cache hash table entries: 65536 (order: 3, 524288 bytes, linear)
[ 0.000000][ T0] Fallback order for Node 0: 0
[ 0.000000][ T0] Built 1 zonelists, mobility grouping on. Total pages: 16384
[ 0.000000][ T0] Policy zone: Normal
[ 0.000000][ T0] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000][ T0] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000][ T0] ftrace: allocating 47180 entries in 12 pages
[ 0.000000][ T0] ftrace: allocated 12 pages with 2 groups
[ 0.000000][ T0] rcu: Hierarchical RCU implementation.
[ 0.000000][ T0] rcu: RCU event tracing is enabled.
[ 0.000000][ T0] rcu: RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=1.
[ 0.000000][ T0] Rude variant of Tasks RCU enabled.
[ 0.000000][ T0] Tracing variant of Tasks RCU enabled.
[ 0.000000][ T0] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000][ T0] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000][ T0] RCU Tasks Rude: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=1.
[ 0.000000][ T0] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=1.
[ 0.000000][ T0] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[ 0.000000][ T0] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.000233][ T0] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
[ 0.000899][ T0] clocksource: timebase mult[1f40000] shift[24] registered
[ 0.006915][ T0] Console: colour dummy device 80x25
[ 0.007805][ T0] printk: legacy console [hvc0] enabled
[ 0.007805][ T0] printk: legacy console [hvc0] enabled
[ 0.008247][ T0] printk: legacy bootconsole [udbg0] disabled
[ 0.008247][ T0] printk: legacy bootconsole [udbg0] disabled
...
Based on a little bit of gdb
debugging, it seems like the kernel gets into lib/maple_tree.c
via the IRQ subsystem but does not come back. Linking lib/maple_tree.o
from a tree built with the good compiler into a tree built with the bad compiler does allow the boot to hobble along further but it still never gets to userspace so that is probably not the only translation unit that has a problem.
Diffing the disassembly from lib/maple_tree.o
between the good and bad revision, I see code generation changes in three functions (everything elses seems to be related like different addresses with the bigger code size from these changes):
mab_mas_cp()
@@ -66,15 +66,16 @@ e9 08 ff f8 ld 8, -8(8)
78 a7 06 20 clrldi 7, 5, 56
7c 85 23 78 mr 5, 4
7c 04 40 40 cmplw 4, 8
-41 81 00 08 bt 1, 0xe770 <mab_mas_cp+0x100>
+41 81 00 08 bt 1, 0xe790 <mab_mas_cp+0x100>
7d 05 43 78 mr 5, 8
-7d 44 38 50 sub 10, 7, 4
-78 88 1f 48 rldic 8, 4, 3, 29
3b 20 00 00 li 25, 0
+7c e4 38 10 subc 7, 7, 4
+78 88 1f 48 rldic 8, 4, 3, 29
+7d 39 01 94 addze 9, 25
+2c 09 ff ff cmpwi 9, -1
39 20 00 00 li 9, 0
-7c 2a 38 40 cmpld 10, 7
-41 81 00 08 bt 1, 0xe78c <mab_mas_cp+0x11c>
-7d 49 53 78 mr 9, 10
+40 82 00 08 bf 2, 0xe7b0 <mab_mas_cp+0x120>
+7c e9 3b 78 mr 9, 7
7f c7 f3 78 mr 7, 30
3a c5 00 01 addi 22, 5, 1
38 a4 ff ff addi 5, 4, -1
mas_alloc_cyclic()
@@ -82,26 +82,28 @@ f8 7e 00 18 std 3, 24(30)
e8 7e 00 18 ld 3, 24(30)
78 64 07 a0 clrldi 4, 3, 62
28 24 00 02 cmpldi 4, 2
-41 82 00 40 bt 2, 0xbf8 <mas_alloc_cyclic+0x178>
+41 82 00 48 bt 2, 0xc00 <mas_alloc_cyclic+0x180>
3b 80 00 00 li 28, 0
-48 00 00 3c b 0xbfc <mas_alloc_cyclic+0x17c>
+48 00 00 44 b 0xc04 <mas_alloc_cyclic+0x184>
e8 7e 00 08 ld 3, 8(30)
+38 80 00 00 li 4, 0
f8 7b 00 00 std 3, 0(27)
-38 63 00 01 addi 3, 3, 1
-28 23 00 00 cmpldi 3, 0
+30 63 00 01 addic 3, 3, 1
+7c 84 01 94 addze 4, 4
f8 7d 00 00 std 3, 0(29)
-40 82 00 14 bf 2, 0xbec <mas_alloc_cyclic+0x16c>
+28 04 00 01 cmplwi 4, 1
+40 82 00 14 bf 2, 0xbf4 <mas_alloc_cyclic+0x174>
e8 7e 00 00 ld 3, 0(30)
80 83 00 04 lwz 4, 4(3)
60 84 08 00 ori 4, 4, 2048
90 83 00 04 stw 4, 4(3)
7f c3 f3 78 mr 3, 30
-48 00 00 01 bl 0xbf0 <mas_alloc_cyclic+0x170>
-48 00 00 18 b 0xc0c <mas_alloc_cyclic+0x18c>
+48 00 00 01 bl 0xbf8 <mas_alloc_cyclic+0x178>
+48 00 00 18 b 0xc14 <mas_alloc_cyclic+0x194>
78 7c f0 82 rldicl 28, 3, 62, 2
38 80 c0 05 li 4, -16379
7c 23 20 40 cmpld 3, 4
-41 81 00 08 bt 1, 0xc0c <mas_alloc_cyclic+0x18c>
+41 81 00 08 bt 1, 0xc14 <mas_alloc_cyclic+0x194>
3b 80 00 00 li 28, 0
7f 83 07 b4 extsw 3, 28
38 21 00 70 addi 1, 1, 112
mas_wr_spanning_store()
@@ -75,10 +75,12 @@ e8 7e 00 00 ld 3, 0(30)
48 00 00 01 bl 0xb080 <mas_wr_spanning_store+0x110>
60 00 00 00 nop
e8 61 00 80 ld 3, 128(1)
-38 83 00 01 addi 4, 3, 1
+30 83 00 01 addic 4, 3, 1
+38 60 00 00 li 3, 0
+7c a3 01 94 addze 5, 3
38 60 ff ff li 3, -1
-28 24 00 00 cmpldi 4, 0
-41 82 00 0c bt 2, 0xb0a4 <mas_wr_spanning_store+0x134>
+28 05 00 00 cmplwi 5, 0
+40 82 00 0c bf 2, 0xb0ac <mas_wr_spanning_store+0x13c>
7c 83 23 78 mr 3, 4
f8 81 00 80 std 4, 128(1)
38 80 ff ff li 4, -1
I am not really familiar with PowerPC assembly so I am not sure if these are expected transformations or not. I am not sure how to go about getting a smaller reproducer at this point.