Description
We are seeing a subtle occasional miscompilation on ARM-M using nightly-2021-04-23
in rust-toolchain
. It is difficult to elicit and reproduce, since subtle changes to the layout of the code will cause the compiler to make decisions that either do or do not trigger the bug. It appears to have something to do with stack frame maintenance in outlined functions. We are definitely observing it on thumbv8m.main-none-eabihf
, but it's subtle enough that we may also be getting it on thumbv7em-none-eabihf
and just haven't noticed it yet.
As of somewhat recently (late April?) output at opt-level = "z"
has started including outlined functions that look like this (actual example):
000211e6 <OUTLINED_FUNCTION_2>:
211e6: f84d ed08 str.w lr, [sp, #-8]!
211ea: e9cd 5007 strd r5, r0, [sp, #28]
211ee: 4620 mov r0, r4
211f0: f8ad 6034 strh.w r6, [sp, #52] ; 0x34
211f4: e9cd 1209 strd r1, r2, [sp, #36] ; 0x24
211f8: f7fe ff33 bl 20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
211fc: 2800 cmp r0, #0
211fe: f85d eb08 ldr.w lr, [sp], #8
21202: 4770 bx lr
Now, note that the instructions at 0x211e6 and 0x211fe are setting up and tearing down a temporary stack frame, respectively. This will become important in a bit.
It appears that the stack frame offsets used in instructions while this temporary stack frame exists are not being updated to reflect its existence. Stack variables updated within the outlined function above are being deposited 8 bytes off where they should be.
I do not currently have a compact repro case, and the code in question has not yet been published (though I could arrange to publish it if it would help, we intend to open source it). Here are two execution traces of programs showing correct behavior vs corrupt behavior. Both traces set up arguments to a syscall, which uses struct return and deposits a struct onto the stack; the routines then shuffle the results around before calling a library function. It is during the shuffling that things go awry.
In this working trace I have called the struct return buffer in the stack frame R and another related-but-separate buffer B. I've omitted instructions that don't contribute by control flow or value-dominating the registers at the end. S refers to the value of the stack pointer on entry to the trace.
2006e: 4606 mov r6, r0 ; r6 = B
20070: a801 add r0, sp, #4 ; r0 = R = S + 4
20072: 460d mov r5, r1
20074: 9000 str r0, [sp, #0]
20076: 4630 mov r0, r6
20078: 2102 movs r1, #2
2007a: 2200 movs r2, #0
2007c: 2300 movs r3, #0
2007e: f001 f8bc bl 211fa <sys_recv_stub>
20082: 2800 cmp r0, #0
...
2009c: 9803 ldr r0, [sp, #12] ; r0 = [S + 12] = [R + 8]
...
200a2: e9dd 1204 ldrd r1, r2, [sp, #16]
; r1 = [S + 16] = [R + 12]
; r2 = [S + 20] = [R + 16]
...
200b0: f001 f89c bl 211ec <OUTLINED_FUNCTION_1>
; (following call)
...
211f0: e9cd 1203 strd r1, r2, [sp, #12]
; [S + 12] = r1 = [R + 12]
; [S + 16] = r2 = [R + 16]
211f4: e9cd 6001 strd r6, r0, [sp, #4]
; [S + 4] = r6 = B
; [S + 8] = [R + 8]
211f8: 4770 bx lr
; (returns)
200b4: a801 add r0, sp, #4 ; r0 = S + 4 = R
200b6: f000 f875 bl 201a4 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
; function invoked with r0 = R = S + 4
; words [S+4], [S+8], [S+12], [S+16] initialized
; everything's good
Now, here is the non-working trace with the same sort of annotations. Note that while the function at the end is still called with one argument R
(stack frame plus 28), the actual struct being passed is deposited starting 8 bytes lower at stack frame plus 20:
200e4: ac07 add r4, sp, #28 ; r4 = S + 28 = R
...
20106: ad06 add r5, sp, #24 ; r5 = S + 24 = B
...
2010a: 4628 mov r0, r5 ; r0 = S + 24
2010c: 2102 movs r1, #2
2010e: 2200 movs r2, #0
20110: 2300 movs r3, #0
20112: 9400 str r4, [sp, #0] ; stack arg = r4 = S + 28 = R
20114: f001 f876 bl 21204 <sys_recv_stub>
20118: 2800 cmp r0, #0
...
20136: e9dd 120a ldrd r1, r2, [sp, #40]
; r1 = [S + 40] = [R + 12]
; r2 = [S + 44] = [R + 16]
...
20144: f001 f84f bl 211e6 <OUTLINED_FUNCTION_2>
; (following call)
211e6: f84d ed08 str.w lr, [sp, #-8]! ; sp = S - 8 <---- stack frame adjust
211ea: e9cd 5007 strd r5, r0, [sp, #28]
; [S + 20] = B
; [S + 24] = r0 = (known to be zero from CFG, omitted)
211ee: 4620 mov r0, r4 ; r0 = r4 = R, set above, before call
...
211f4: e9cd 1209 strd r1, r2, [sp, #36]
; [S + 28] = r1 = [R + 12]
; [S + 32] = r2 = [R + 16]
211f8: f7fe ff33 bl 20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
; function invoked with r0 = S + 28
; actual struct written at: S+20.
Additional notes:
- We started seeing this after bumping
rust-toolchain
fromnightly-2020-12-29
tonightly-2021-04-23
, so the behavior was introduced somewhere between those points. (@luqmana points out that this likely includes the LLVM 11-12 transition.) - While we have seen this on ARMv7/8-M, that's because that's the architecture we're using -- it might affect other platforms, not sure.
- We build mostly at
opt-level = "z"
but this may or may not be specific to that opt level. - This was found by @labbott. Thanks also to @bcantrill for helping reduce the behavior.
Meta
rustc --version --verbose
:
rustc 1.53.0-nightly (7f4afdf02 2021-04-22)
binary: rustc
commit-hash: 7f4afdf0255600306bf67432da722c7b5d2cbf82
commit-date: 2021-04-22
host: x86_64-unknown-linux-gnu
release: 1.53.0-nightly
LLVM version: 12.0.0