Skip to content

Always inline atomic primitives #68155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

therealprof
Copy link
Contributor

Atomic primitves are supposed to control the way the processor interacts
with data but they're not supposed to have any noticeable on program
control flow, but they do. Without turning them into #inline(always)
they will ultimately end up in dev builds which has a severe impact on
performance and binary size especially on embedded platforms.

Signed-off-by: Daniel Egger daniel@eggers-club.de

Atomic primitves are supposed to control the way the processor interacts
with data but they're not supposed to have any noticeable on program
control flow, but they do. Without turning them into `#inline(always)`
they will ultimately end up in dev builds which has a severe impact on
performance and binary size especially on embedded platforms.

Signed-off-by: Daniel Egger <daniel@eggers-club.de>
@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @KodrAus (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 12, 2020
@therealprof
Copy link
Contributor Author

Let me expand a bit further on the specific problem I'm seeing. Consider one of the simplest programs one can write for an embedded target:

#![no_main]
#![no_std]

use panic_halt as _;
use cortex_m_rt::entry;

#[entry]
fn main() -> ! {
    loop {
        continue;
    }
}

Compiling in dev mode for a thumbv6m-none-eabi target will result in a binary with the following functions:

Analyzing target/thumbv6m-none-eabi/debug/examples/empty

File  .text Size       Crate Name
0.1%  11.3%  74B cortex_m_rt r0::init_data
0.0%   9.8%  64B cortex_m_rt core::sync::atomic::compiler_fence
0.0%   9.8%  64B  panic_halt core::sync::atomic::compiler_fence
0.0%   8.6%  56B cortex_m_rt r0::zero_bss
0.0%   7.3%  48B cortex_m_rt Reset
0.0%   6.4%  42B cortex_m_rt core::ptr::read
0.0%   5.5%  36B         std core::panicking::panic
0.0%   4.3%  28B   [Unknown] __aeabi_memcpy4
0.0%   4.3%  28B         std core::panicking::panic_fmt
0.0%   4.0%  26B cortex_m_rt core::intrinsics::copy_nonoverlapping
0.0%   3.4%  22B cortex_m_rt core::ptr::<impl *const T>::offset
0.0%   3.4%  22B cortex_m_rt core::ptr::<impl *mut T>::offset
0.0%   3.1%  20B cortex_m_rt HardFault_
0.0%   3.1%  20B  panic_halt rust_begin_unwind
0.0%   2.8%  18B cortex_m_rt DefaultHandler_
0.0%   2.8%  18B   [Unknown] __aeabi_memcpy
0.0%   2.4%  16B cortex_m_rt core::ptr::write
0.0%   2.4%  16B cortex_m_rt core::ptr::write_volatile
0.0%   2.4%  16B         std <T as core::any::Any>::type_id
0.0%   1.8%  12B cortex_m_rt core::mem::zeroed
0.0%   0.6%   4B   [Unknown] main
0.0%   0.3%   2B cortex_m_rt DefaultPreInit
0.0%   0.3%   2B         std core::ptr::real_drop_in_place
0.5% 100.0% 654B             .text section size, the file size is 127.6KiB

As can be seen core::sync::atomic::compiler_fence is included even twice, although there's a clear statement in the function that says:

compiler_fence does not emit any machine code

@hanna-kruppe
Copy link
Contributor

This indicates that -O0 codegen of these functions is bad (unsurprising), but I don't see any evidence that these inline(always) additions make it better. In fact, I expect they'll do the opposite. Without any optimizations, the inlining does not actually enable further simplfications. For example, the methods that take an Ordering and match on it (such as compiler_fence) will most likely still do that matching at runtime even if the ordering parameter is a compile time constant. Only now all that code is duplicated at every call site, instead of a few times (presumably, once per crate or codegen unit that uses them, due to the #[inline] attribute).

@therealprof
Copy link
Contributor Author

@rkruppe In my experience #[inline(always)] helps quite a bit. At the very least it gets rid of the function calls which do come with a substantial overhead, especially on embedded targets.

For the compiler_fence case there're actually quite a few surprising facts coming together:

  • The absence of optimisation (libcore/libstd should always be optimized, even for debug builds, right?)
  • The duplicate emission of the same object code for it and that it even occurs in the linked binary, despite using LTO
  • The horrific code generated for it

Somehow it should be possible to stay true to the promise that these primitives only affect the compiler but do not change the program flow.

@therealprof
Copy link
Contributor Author

For reference, here's a disassembly of generated binary: https://gist.github.com/therealprof/fd82057b9b2f0a9f440a91833700af50

@therealprof
Copy link
Contributor Author

@rkruppe I've been looking into the code generated by the #[inline(always)] version and you're right that this actually makes the situation worse since it will inline it in more than just two places so I'm going to withdraw this change; it still is a problem that I feel should be adressed rather sooner than later since it's a part of the big problem that embedded Rust is very debugging unfriendly.

@therealprof
Copy link
Contributor Author

Just another data point regarding the compiler_fence, getting rid of the panic! in the match helps tremendously (inline or not). Another option would be to make the different fences directly available using specialised functions instead of hoping that the compiler will optimise out.

@therealprof
Copy link
Contributor Author

This is with the panic! replaced by a unit type. Please note that the .text size is only part of the picture, e.g. the strings for the panic message(s) are not included here and also substantial.

Analyzing target/thumbv6m-none-eabi/debug/examples/empty

File  .text Size       Crate Name
0.0%  15.6%  74B cortex_m_rt r0::init_data
0.0%  11.8%  56B cortex_m_rt r0::zero_bss
0.0%  10.1%  48B cortex_m_rt Reset
0.0%   9.3%  44B cortex_m_rt core::sync::atomic::compiler_fence
0.0%   8.9%  42B cortex_m_rt core::ptr::read
0.0%   5.9%  28B         std __aeabi_memcpy4
0.0%   5.5%  26B cortex_m_rt core::intrinsics::copy_nonoverlapping
0.0%   4.6%  22B cortex_m_rt core::ptr::const_ptr::<impl *const T>::offset
0.0%   4.6%  22B cortex_m_rt core::ptr::mut_ptr::<impl *mut T>::offset
0.0%   4.2%  20B cortex_m_rt HardFault_
0.0%   3.8%  18B cortex_m_rt DefaultHandler_
0.0%   3.8%  18B         std __aeabi_memcpy
0.0%   3.4%  16B cortex_m_rt core::ptr::write
0.0%   3.4%  16B cortex_m_rt core::ptr::write_volatile
0.0%   2.5%  12B cortex_m_rt core::mem::zeroed
0.0%   1.3%   6B   [Unknown] main
0.0%   0.8%   4B       empty empty::__cortex_m_rt_main
0.0%   0.4%   2B cortex_m_rt DefaultPreInit
0.1% 100.0% 474B             .text section size, the file size is 322.8KiB

@therealprof
Copy link
Contributor Author

Closing this non-actionable PR for discussion in #68208.

@therealprof therealprof deleted the always-inline-atomic-ops branch January 14, 2020 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants