Skip to content

Inlined function duplication across complex branches when extern "Rust" is used with LTO and opt-level="s" #102295

Open
@cr1901

Description

@cr1901

Context

The example code I linked/described here is an MCVE. See Background For "Real" Applications section for details.

  • Consider a Rust binary which calls a function free(f) within its main(). free() takes a closure f with a branch (?) as input, and in turn calls f and then a function called release().
  • The Rust binary has a feature called use-extern-cs. When disabled, the body of both free() and release() are provided by an external crate called critical. When enabled, the free() function is provided by the main binary instead of critical, and the release() function is marked as extern "Rust" in the main binary's source file.
  • Within the critical crate, the release() function may or may not be marked as #[inline]. This is controlled by the critical/inline feature.

Instructions

  1. If testing msp430, make sure the msp430-elf-gcc toolchain is installed. Optionally install just for convenience.

  2. git clone https://github.com/cr1901/msp430-size. Use commit b8ef905 specifically.

    Despite the name of the repo, this code works for thumbv6m-none-eabi as well; the behavior appears to be arch-agnostic.

  3. Make sure a nightly Rust toolchain is installed (for -Zbuild-std=core).

  4. Run the following command:

    cargo +nightly rustc --manifest-path=./test-cases/Cargo.toml --target=$TARGET --release -Zbuild-std=core --example=critical --features=$FEATURES -- --emit=obj=target/$TARGET/release/examples/critical.o,llvm-ir=target/$TARGET/release/examples/critical.ll,asm=target/$TARGET/release/examples/critical.s

    where:

    • $TARGET: either msp430-none-elf or thumbv6m-none-eabi.
    • $FEATURES: empty, use-extern-cs, critical/inline, or use-extern-cs,critical/inline
  5. Examine the output LLVM, assembly, and object/ELF files with objdump and look for a series of ten nops once or multiple times. Each nop sled represents a call to release.

Expected Behavior

The body of release appears once for the single call to free(), regardless of which combinations of features are enabled (including none).

Actual Behavior

The body of release appears twice in the single call to free() for all combinations of features, except for --features=critical/inline.

Other Hints

  • Sometimes I don't need the #[inline] attribute to prevent release's body from being duplicated. However, I could not translate this behavior well from my real application to MCVE. One way that I found works is to remove the extern "Rust" fn release() declaration, and paste the critical::internal::release() impl directly in the main source file.
  • The extern "Rust" declaration seems to prevent #[inline] hints from working at all.
  • If rustc decides to duplicate release, sometimes rustc will inline one call of release into free, but not the other.
  • release duplication appears in the LLVM files emitted by rustc.

Background For "Real" Applications

The embedded Rust community has started to standardize around a pluggable critical-section crate. The critical-section crate by necessity marks some functions as extern "Rust" and defers to other crates to define them. Specifically, the critical_section::free(f) function takes a closure f() and calls in order (args omitted):

  1. extern "Rust" acquire()
  2. f()
  3. extern "Rust" release()

The crate doesn't define any new functionality for embedded Rust applications; it rather changes how existing functionality (critical sections) is implemented. In principle, the crate should be drop-in to existing embedded Rust applications.

When I transitioned some embedded Rust code to use the critical-section crate, I noticed marked size increases in the .text section (1992 bytes => 2048+ bytes- no longer fits) due to new overhead from how critical_section::free(f) is inlined in my main application's functions. Specifically, if the closure f to critical_section::free(f) has a sufficiently complex branch, rustc will duplicate the body of release across both sides of the branch, even when lto="fat" and opt-level="s".

Calling critical_section::free() is essential for sharing non-atomic data between interrupts/threads in a bare-metal application. To minimize interrupt latency/maximize the amount of work that can be done, the size/speed overhead these calls should be kept as small as possible. I don't understand why Rust is unable to inline calls to critical_section::free(f) without duplicating the body of release (when lto="fat" and codegen-units=1 is enabled), regardless of
the following scenarios:

  1. acquire(), release(), and free() are all provided inline by the main binary.
  2. acquire(), release(), and free() are all provided by the same crate (via use statements no extern "Rust").
  3. free() is provided by one crate (via use), acquire() and release() are provided by another (via use).
  4. free() is provided by one crate (via use), extern "Rust" acquire() and extern "Rust" release() are provided by another crate.

For the MCVE the body of release is exaggerated; actual size difference will vary depending on application. From my own testing, real thumbv6m-none-eabi applications have the duplication, but on average are affected less than msp430-none-elf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LTOArea: Link-time optimization (LTO)C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.O-ArmTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 stateO-msp430T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions