Description
Since ThinLTO became default in Rust my bootloader went from 17 KiB to about 42 KiB in size, which puts it over the 32 KiB limit. To get my bootloader to fit in this limit I've been able to strip out all of core::fmt
by not having any parameters to my panic_fmt
routine and this allows LTO to effectively delete all construction of the core::fmt
parameters to panic_fmt
and thus save huge amounts of code.
Since ThinLTO became default it seems that certain functions using volatile memory or inline assembly struggle to be optimized fully, however this is all I've found that replicates the issue, perhaps more common use of Rust could also trigger it.
For the tiny test case (included at the end of this issue) historically (from at least 2016 to November 2017) produced a .text section of ~100 bytes, and a total virtual program size of ~500 bytes. Since ThinLTO became default this tiny program grows to ~2400 bytes of .text and ~3200 bytes overall.
While this 2 KiB increase seems small, keep in mind it is for an application that is trivial and only has one thing using core::fmt
(the bounds check panic on the array index). When a program grows and more panic come about these changes to be a much larger increase, in my case over 20 KiB.
While I'm writing an RFC to address this issue more permanently by proposing an optional (developer opt-in) panic routine which does not rely on core::fmt
(probably just a &'static str
for the filename and u32
for the line for bare bones debugging where nothing more can be afforded), I wanted to open this issue just in case it's a more fundamental optimization issue that could be fixed.
The core::intrinsics::abort()
is critical in the panic_fmt
for the code size increase to occur. Other things that work here as well are statements like asm!("nop")
or really any assembly statement. Further, assembly statements that are included in functions that are called from panic_fmt
also cause the optimization problem to occur. This makes workarounds impossible in my case as my panic dumps to a serial port which ultimately requires inline assembly to access. So for my bootloader I have resorted to using an old version of Rust.
What I find particularly strange is that while I've defined my panic_fmt
to not take any parameters this doesn't seem to be a big enough hint to Rust/LLVM that they are not used and thus all construction of these parameters can be DCE'd.
For a size listing of this test program using all nightly builds of Rust in the past 1.5 years see: https://gist.github.com/gamozolabs/d72c05e2b200f02397f1495cf33c2674
We can see from this that the size increase occured between rustc 1.24.0-nightly (f9b0897c5 2017-12-02)
and rustc 1.24.0-nightly (1956d5535 2017-12-03)
which happens to be when ThinLTO went default.
I've tried a few different tests using various panic_fmt
declarations, with parameters, without, as extern, as pub, etc, and they seem to all be affected by this issue.
Here's an example basic piece of code to be built with nightly-x86_64-pc-windows-msvc
using args -C lto -C panic=abort -O
:
Here is the disassembly difference between this same code but one with ThinLTO and one with full LTO: https://gist.github.com/gamozolabs/41b2059a55d5e0b60d7251dd8a31c055
#![no_std]
#![feature(lang_items, start, core_intrinsics)]
#[lang = "panic_fmt"]
#[no_mangle]
pub extern fn rust_begin_panic(_msg: core::fmt::Arguments,
_file: &'static str,
_line: u32,
_column: u32) -> ! {
unsafe { core::intrinsics::abort() }
}
// These functions are used by the compiler, but not
// for a bare-bones hello world. These are normally
// provided by libstd.
#[lang = "eh_personality"]
#[no_mangle]
pub extern fn rust_eh_personality() {
}
// This function may be needed based on the compilation target.
#[lang = "eh_unwind_resume"]
#[no_mangle]
pub extern fn rust_eh_unwind_resume() {
}
#[start]
#[no_mangle]
#[allow(non_snake_case)]
pub fn mainCRTStartup(argc: isize, _argv: *const *const u8) -> isize
{
let foo = [1, 2, 3, 4];
foo[argc as usize]
}