Skip to content

[codegen] Unnecessary panicking branches in resumption of infinite generator (stored in static variable) #66100

Closed
@japaric

Description

@japaric

Tested with nightly-2019-11-04

Observed behavior

The following no_std program which stores a generator in a static variable and
then resumes it (no indirection / dynamic-dispatch involved):

#![feature(generator_trait)]
#![feature(generators)]
#![feature(never_type)]
#![feature(type_alias_impl_trait)]
#![no_main]
#![no_std]

use core::{mem::MaybeUninit, ops::Generator, pin::Pin};
use cortex_m_rt::{entry, exception};
use panic_halt as _;

type A = impl Generator<Yield = (), Return = !>;

fn foo() -> A {
    || loop {
        yield
    }
}

static mut X: MaybeUninit<A> = MaybeUninit::uninit();

#[entry]
fn main() -> ! {
    unsafe {
        X.as_mut_ptr().write(foo());
    }

    loop {}
}

#[exception]
fn SysTick() {
    unsafe {
        Pin::new_unchecked(&mut *X.as_mut_ptr()).resume();
    }
}

when fully optimized for the thumbv7m-none-eabi target (panic = abort)
produces the following machine code for the SysTick function:

00000400 <SysTick>:
 400:   f240 0000       movw    r0, #0
 404:   f2c2 0000       movt    r0, #8192       ; 0x2000
 408:   6801            ldr     r1, [r0, #0]
 40a:   e8df f001       tbb     [pc, r1]
 40e:   0502            .short  0x0502
 410:   020d            .short  0x020d
 412:   2103            movs    r1, #3
 414:   6001            str     r1, [r0, #0]
 416:   4770            bx      lr
 418:   f240 5030       movw    r0, #1328       ; 0x530
 41c:   2122            movs    r1, #34 ; 0x22
 41e:   f2c0 0000       movt    r0, #0
 422:   f000 f80c       bl      43e <core::panicking::panic> ; <- 1
 426:   defe            udf     #254    ; 0xfe
 428:   f240 5000       movw    r0, #1280       ; 0x500
 42c:   2121            movs    r1, #33 ; 0x21
 42e:   f2c0 0000       movt    r0, #0
 432:   f000 f804       bl      43e <core::panicking::panic> ; <- 2
 436:   defe            udf     #254    ; 0xfe

This function resumes the generator stored in the static variable X. The
generator is extremely simple and contains no explicit panicking branches yet
the machine code contains 2 panicking branches which are not necessary for
memory safety (more on this later).

If the program is slightly modified like this (semantics are not changed):

@@ -7,7 +7,7 @@

 use core::{mem::MaybeUninit, ops::Generator, pin::Pin};
 use cortex_m_rt::{entry, exception};
-use panic_halt as _;
+// use panic_halt as _; // "source-inlined" as the `panic_halt` function at the bottom

 type A = impl Generator<Yield = (), Return = !>;

@@ -34,3 +34,8 @@
         Pin::new_unchecked(&mut *X.as_mut_ptr()).resume();
     }
 }
+
+#[panic_handler]
+fn panic_halt(_: &core::panic::PanicInfo) -> ! {
+    loop {}
+}

Then SysTick is optimized to the following machine code, which is free of
panicking branches:

00000400 <SysTick>:
 400:   f240 0000       movw    r0, #0
 404:   2103            movs    r1, #3
 406:   f2c2 0000       movt    r0, #8192       ; 0x2000
 40a:   6001            str     r1, [r0, #0]
 40c:   4770            bx      lr

Expected behavior

This second machine code (the one with 5 instructions) is what one would expect
from the first version of the program. Changing the shape / complexity of unrelated code
should not introduce panicking branches.

Cause

The Generator.resume function decides which part of the generator body to
execute based on the value of a discriminant. As of nightly-2019-11-04 the code
generated for the resume function contains 2 "built-in" panicking branches,
which are present in all generators:

; Function Attrs: nofree norecurse nounwind
define void @SysTick() unnamed_addr #1 {
start:
  %_3.i = load i32, i32* bitcast (<{ [4 x i8] }>* @_ZN3app1X17h48a9105f802c5ed4E to i32*), align 4
  switch i32 %_3.i, label %bb6.i [
    i32 0, label %"_ZN3app3foo28_$u7b$$u7b$closure$u7d$$u7d$17hcc44d33a27bb7de8E.exit"
    i32 1, label %panic1.i ; <- 1
    i32 2, label %panic.i  ; <- 2
    i32 3, label %"_ZN3app3foo28_$u7b$$u7b$closure$u7d$$u7d$17hcc44d33a27bb7de8E.exit"
  ]
; ..
}

The first built-in panicking branch, which corresponds to discriminant 1, is
used to panic a generator that has been resume-d beyond completion -- a
generator that has reached a return statement has reached completion.

The second built-in branch, which corresponds to discriminant 2, is used to
panic a generator that has been resume-d after panicking -- I'm not sure when
this can occur in practice but looking at the compiler codegen unwinding is required to
reach this "poisoned" state.

However, neither of these situations can occur in the example. The discriminant
of the generator can never become 1 because it does not contain a return
statement in its body. Its discriminant can not become 2 either because the
target has been configured for panic = abort meaning that unwinding is not
supported at the codegen level (UnwindResume functions are never generated). This
can be confirmed by scanning the optimized LLVM IR produced by the compiler: the
discriminant is stored in static memory (.bss) and the only values written to
this memory are 0 and 3.

Possible fixes (?)

Naively, I would improve the MIR codegen by:

  • Not producing the discriminant == 1 branch if the Return type of the
    generator is ! (or, I guess, any other empty type like enum Void {}). This
    type indicates that the generator does not contain a return statement.

  • Not producing the discriminant == 2 branch if the target has set its panic
    option to abort. Alternatively, one could check if the unwind-related
    Resume function is generated and omit the branch if it's not generated.
    Relevant code:

// Poison the generator when it unwinds
for block in body.basic_blocks_mut() {
let source_info = block.terminator().source_info;
if let &TerminatorKind::Resume = &block.terminator().kind {
block.statements.push(
transform.set_discr(VariantIdx::new(POISONED), source_info));
}
}

Does these seem like sensible fixes?


STR

Steps to produce the above machine code. This is not a minified repro because a
certain complexity is required to get the inferior codegen -- trivial programs
get properly optimized.

$ rustup target add thumbv7m-none-eabi
$ rustup component add llvm-tools-preview
$ cargo new --bin app && cd app
$ # or edit Cargo.toml
$ cargo add cortex-m-rt --vers '=0.6.10'
$ cargo add panic-halt --vers '=0.2.0'
$ cat <<EOF >>Cargo.toml
[profile.release]
codegen-units = 1
lto = true
EOF
$ mkdir .cargo
$ cat <<EOF >.cargo/config
[target.thumbv7m-none-eabi]
rustflags = ["-C", "link-arg=-Tlink.x"]
[build]
target = "thumbv7m-none-eabi"
EOF
$ cat <<EOF >memory.x
MEMORY
{
  FLASH : ORIGIN = 0x00000000, LENGTH = 256K
  RAM : ORIGIN = 0x20000000, LENGTH = 64K
}
EOF
$ $EDITOR src/main.rs # enter code shown at the beginning of this issue
$ cargo build --release
$ # or use `arm-none-eabi-objdump` from the `binutils-arm-none-eabi` package (Debian)
$ $(find $(rustc --print sysroot) -name llvm-objdump) -Cd -triple=thumbv7m-none-eabi target/thumbv7m-none-eabi/release/app

Metadata

Metadata

Labels

A-async-awaitArea: Async & AwaitA-codegenArea: Code generationA-coroutinesArea: CoroutinesAsyncAwait-TriagedAsync-await issues that have been triaged during a working group meeting.C-enhancementCategory: An issue proposing an enhancement or a PR with one.F-coroutines`#![feature(coroutines)]`F-type_alias_impl_trait`#[feature(type_alias_impl_trait)]`I-heavyIssue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.WG-embeddedWorking group: Embedded systems

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions