Skip to content

Miscompilation on ARM-M with nightly-2021-04-23 #85351

Closed
@cbiffle

Description

@cbiffle

We are seeing a subtle occasional miscompilation on ARM-M using nightly-2021-04-23 in rust-toolchain. It is difficult to elicit and reproduce, since subtle changes to the layout of the code will cause the compiler to make decisions that either do or do not trigger the bug. It appears to have something to do with stack frame maintenance in outlined functions. We are definitely observing it on thumbv8m.main-none-eabihf, but it's subtle enough that we may also be getting it on thumbv7em-none-eabihf and just haven't noticed it yet.

As of somewhat recently (late April?) output at opt-level = "z" has started including outlined functions that look like this (actual example):

000211e6 <OUTLINED_FUNCTION_2>:
   211e6:	f84d ed08 	str.w	lr, [sp, #-8]!
   211ea:	e9cd 5007 	strd	r5, r0, [sp, #28]
   211ee:	4620      	mov	r0, r4
   211f0:	f8ad 6034 	strh.w	r6, [sp, #52]	; 0x34
   211f4:	e9cd 1209 	strd	r1, r2, [sp, #36]	; 0x24
   211f8:	f7fe ff33 	bl	20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
   211fc:	2800      	cmp	r0, #0
   211fe:	f85d eb08 	ldr.w	lr, [sp], #8
   21202:	4770      	bx	lr

Now, note that the instructions at 0x211e6 and 0x211fe are setting up and tearing down a temporary stack frame, respectively. This will become important in a bit.

It appears that the stack frame offsets used in instructions while this temporary stack frame exists are not being updated to reflect its existence. Stack variables updated within the outlined function above are being deposited 8 bytes off where they should be.

I do not currently have a compact repro case, and the code in question has not yet been published (though I could arrange to publish it if it would help, we intend to open source it). Here are two execution traces of programs showing correct behavior vs corrupt behavior. Both traces set up arguments to a syscall, which uses struct return and deposits a struct onto the stack; the routines then shuffle the results around before calling a library function. It is during the shuffling that things go awry.

In this working trace I have called the struct return buffer in the stack frame R and another related-but-separate buffer B. I've omitted instructions that don't contribute by control flow or value-dominating the registers at the end. S refers to the value of the stack pointer on entry to the trace.

   2006e:	4606      	mov	r6, r0  ; r6 = B
   20070:	a801      	add	r0, sp, #4  ; r0 = R = S + 4
   20072:	460d      	mov	r5, r1
   20074:	9000      	str	r0, [sp, #0]
   20076:	4630      	mov	r0, r6
   20078:	2102      	movs	r1, #2
   2007a:	2200      	movs	r2, #0
   2007c:	2300      	movs	r3, #0
   2007e:	f001 f8bc 	bl	211fa <sys_recv_stub>
   20082:	2800      	cmp	r0, #0
    ...
   2009c:	9803      	ldr	r0, [sp, #12] ; r0 = [S + 12] = [R + 8]
    ...
   200a2:	e9dd 1204 	ldrd	r1, r2, [sp, #16]
    ; r1 = [S + 16] = [R + 12]
    ; r2 = [S + 20] = [R + 16]
    ...
   200b0:	f001 f89c 	bl	211ec <OUTLINED_FUNCTION_1>
    ; (following call)
    ...
   211f0:	e9cd 1203 	strd	r1, r2, [sp, #12]
    ; [S + 12] = r1 = [R + 12]
    ; [S + 16] = r2 = [R + 16]
   211f4:	e9cd 6001 	strd	r6, r0, [sp, #4]
    ; [S + 4] = r6 = B
    ; [S + 8] = [R + 8]
   211f8:	4770      	bx	lr
    ; (returns)
   200b4:	a801      	add	r0, sp, #4  ; r0 = S + 4 = R
   200b6:	f000 f875 	bl	201a4 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = R = S + 4
    ; words [S+4], [S+8], [S+12], [S+16] initialized
    ; everything's good

Now, here is the non-working trace with the same sort of annotations. Note that while the function at the end is still called with one argument R (stack frame plus 28), the actual struct being passed is deposited starting 8 bytes lower at stack frame plus 20:

   200e4:	ac07      	add	r4, sp, #28 ; r4 = S + 28 = R
    ...
   20106:	ad06      	add	r5, sp, #24 ; r5 = S + 24 = B
    ...
   2010a:	4628      	mov	r0, r5  ; r0 = S + 24
   2010c:	2102      	movs	r1, #2
   2010e:	2200      	movs	r2, #0
   20110:	2300      	movs	r3, #0
   20112:	9400      	str	r4, [sp, #0]  ; stack arg = r4 = S + 28 = R
   20114:	f001 f876 	bl	21204 <sys_recv_stub>
   20118:	2800      	cmp	r0, #0
    ...
   20136:	e9dd 120a 	ldrd	r1, r2, [sp, #40]
    ; r1 = [S + 40] = [R + 12]
    ; r2 = [S + 44] = [R + 16]
    ...
   20144:	f001 f84f 	bl	211e6 <OUTLINED_FUNCTION_2>
    ; (following call)
   211e6:	f84d ed08 	str.w	lr, [sp, #-8]!  ; sp = S - 8 <---- stack frame adjust
   211ea:	e9cd 5007 	strd	r5, r0, [sp, #28]
    ; [S + 20] = B
    ; [S + 24] = r0 = (known to be zero from CFG, omitted)
   211ee:	4620      	mov	r0, r4  ; r0 = r4 = R, set above, before call
    ...
   211f4:	e9cd 1209 	strd	r1, r2, [sp, #36]
    ; [S + 28] = r1 = [R + 12]
    ; [S + 32] = r2 = [R + 16]
   211f8:	f7fe ff33 	bl	20062 <_ZN7userlib2hl7Message5fixed17h5f2a9abf3d25035aE>
    ; function invoked with r0 = S + 28
    ; actual struct written at: S+20.

Additional notes:

  • We started seeing this after bumping rust-toolchain from nightly-2020-12-29 to nightly-2021-04-23, so the behavior was introduced somewhere between those points. (@luqmana points out that this likely includes the LLVM 11-12 transition.)
  • While we have seen this on ARMv7/8-M, that's because that's the architecture we're using -- it might affect other platforms, not sure.
  • We build mostly at opt-level = "z" but this may or may not be specific to that opt level.
  • This was found by @labbott. Thanks also to @bcantrill for helping reduce the behavior.

Meta

rustc --version --verbose:

rustc 1.53.0-nightly (7f4afdf02 2021-04-22)
binary: rustc
commit-hash: 7f4afdf0255600306bf67432da722c7b5d2cbf82
commit-date: 2021-04-22
host: x86_64-unknown-linux-gnu
release: 1.53.0-nightly
LLVM version: 12.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleICEBreaker-LLVMBugs identified for the LLVM ICE-breaker groupP-criticalCritical priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions