Description
Context
The example code I linked/described here is an MCVE. See Background For "Real" Applications section for details.
- Consider a Rust binary which calls a function
free(f)
within itsmain()
.free()
takes a closuref
with a branch (?
) as input, and in turn callsf
and then a function calledrelease()
. - The Rust binary has a feature called
use-extern-cs
. When disabled, the body of bothfree()
andrelease()
are provided by an external crate calledcritical
. When enabled, thefree()
function is provided by the main binary instead ofcritical
, and therelease()
function is marked asextern "Rust"
in the main binary's source file. - Within the
critical
crate, therelease()
function may or may not be marked as#[inline]
. This is controlled by thecritical/inline
feature.
Instructions
-
If testing
msp430
, make sure themsp430-elf-gcc
toolchain is installed. Optionally installjust
for convenience. -
git clone https://github.com/cr1901/msp430-size
. Use commit b8ef905 specifically.Despite the name of the repo, this code works for
thumbv6m-none-eabi
as well; the behavior appears to be arch-agnostic. -
Make sure a nightly Rust toolchain is installed (for
-Zbuild-std=core
). -
Run the following command:
cargo +nightly rustc --manifest-path=./test-cases/Cargo.toml --target=$TARGET --release -Zbuild-std=core --example=critical --features=$FEATURES -- --emit=obj=target/$TARGET/release/examples/critical.o,llvm-ir=target/$TARGET/release/examples/critical.ll,asm=target/$TARGET/release/examples/critical.s
where:
$TARGET
: eithermsp430-none-elf
orthumbv6m-none-eabi
.$FEATURES
: empty,use-extern-cs
,critical/inline
, oruse-extern-cs,critical/inline
-
Examine the output LLVM, assembly, and object/ELF files with
objdump
and look for a series of tennop
s once or multiple times. Eachnop
sled represents a call torelease
.
Expected Behavior
The body of release
appears once for the single call to free()
, regardless of which combinations of features are enabled (including none).
Actual Behavior
The body of release
appears twice in the single call to free()
for all combinations of features, except for --features=critical/inline
.
Other Hints
- Sometimes I don't need the
#[inline]
attribute to preventrelease
's body from being duplicated. However, I could not translate this behavior well from my real application to MCVE. One way that I found works is to remove theextern "Rust" fn release()
declaration, and paste thecritical::internal::release()
impl directly in the main source file. - The
extern "Rust"
declaration seems to prevent#[inline]
hints from working at all. - If
rustc
decides to duplicaterelease
, sometimesrustc
will inline one call ofrelease
intofree
, but not the other. release
duplication appears in the LLVM files emitted byrustc
.
Background For "Real" Applications
The embedded Rust community has started to standardize around a pluggable critical-section
crate. The critical-section
crate by necessity marks some functions as extern "Rust"
and defers to other crates to define them. Specifically, the critical_section::free(f)
function takes a closure f()
and calls in order (args omitted):
extern "Rust" acquire()
f()
extern "Rust" release()
The crate doesn't define any new functionality for embedded Rust applications; it rather changes how existing functionality (critical sections) is implemented. In principle, the crate should be drop-in to existing embedded Rust applications.
When I transitioned some embedded Rust code to use the critical-section
crate, I noticed marked size increases in the .text
section (1992 bytes => 2048+ bytes- no longer fits) due to new overhead from how critical_section::free(f)
is inlined in my main application's functions. Specifically, if the closure f
to critical_section::free(f)
has a sufficiently complex branch, rustc
will duplicate the body of release
across both sides of the branch, even when lto="fat"
and opt-level="s"
.
Calling critical_section::free()
is essential for sharing non-atomic data between interrupts/threads in a bare-metal application. To minimize interrupt latency/maximize the amount of work that can be done, the size/speed overhead these calls should be kept as small as possible. I don't understand why Rust is unable to inline calls to critical_section::free(f)
without duplicating the body of release
(when lto="fat"
and codegen-units=1
is enabled), regardless of
the following scenarios:
acquire()
,release()
, andfree()
are all provided inline by the main binary.acquire()
,release()
, andfree()
are all provided by the same crate (viause
statements noextern "Rust"
).free()
is provided by one crate (viause
),acquire()
andrelease()
are provided by another (viause
).free()
is provided by one crate (viause
),extern "Rust" acquire()
andextern "Rust" release()
are provided by another crate.
For the MCVE the body of release
is exaggerated; actual size difference will vary depending on application. From my own testing, real thumbv6m-none-eabi
applications have the duplication, but on average are affected less than msp430-none-elf
.