Skip to content

derive(Debug) creates a fair bit of reduntant inlining work for the llvm optimizer #43843

Closed
@oyvindln

Description

@oyvindln

After discussing it on the forums I bit I decided to look into what the optimizer spends time on, and inlining seems to be a significant part of it. There seems to be a lot of room for improvement here.
Tested by compiling syntex in rustc-benchmarks (though the inlining time seem quite significant in other crates as well.).

I counted which functions were inlined the most, and fmt::debug_tuple and to a lesser extent fmt::debug_struct, which are emitted when automatically deriving Debug, and which simply wrap two other functions inside fmt::builder show up a lot. Maybe derive(debug) could output the wrapped functions instead, or the equivalent code, so llvm doesn't have to spend time inlining them (unless this causes problems for debug info). fmt::arguments::new_v1 is also inlined a lot, and also plays a role in derive(Debug). Maybe something could be done about that one as well, as it's simply a function that creates a struct but doesn't do anything else.

Output from time-llvm-passes Using `RUSTFLAGS="-Z time-llvm-passes" cargo +nightly build --release 2> llvm_time.txt` in the syntex-0.42.2 directory in rustc-benchmarks.

It's a bit hard to say which crate is which, as the output from llvm and cargo isn't synced, though these are the bottom three (excluding the benchmark dummy) in reverse order:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 6.3800 seconds (6.2624 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.7920 ( 14.1%)   0.0960 ( 12.8%)   0.8880 ( 13.9%)   0.8489 ( 13.6%)  Function Integration/Inlining
   0.5400 (  9.6%)   0.0760 ( 10.2%)   0.6160 (  9.7%)   0.5365 (  8.6%)  X86 DAG->DAG Instruction Selection
   0.3080 (  5.5%)   0.0280 (  3.7%)   0.3360 (  5.3%)   0.3031 (  4.8%)  Combine redundant instructions
   0.2400 (  4.3%)   0.0080 (  1.1%)   0.2480 (  3.9%)   0.2227 (  3.6%)  Dead Store Elimination
   0.2040 (  3.6%)   0.0120 (  1.6%)   0.2160 (  3.4%)   0.2147 (  3.4%)  SROA
...
  Total Execution Time: 96.6840 seconds (96.6925 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   9.5480 ( 10.4%)   0.5440 ( 10.3%)  10.0920 ( 10.4%)  10.1854 ( 10.5%)  Function Integration/Inlining
   6.9000 (  7.5%)   0.7120 ( 13.5%)   7.6120 (  7.9%)   7.7547 (  8.0%)  X86 DAG->DAG Instruction Selection
   5.8280 (  6.4%)   0.1120 (  2.1%)   5.9400 (  6.1%)   6.0239 (  6.2%)  Loop Invariant Code Motion
   4.5640 (  5.0%)   0.1680 (  3.2%)   4.7320 (  4.9%)   4.8229 (  5.0%)  Global Value Numbering
   3.8240 (  4.2%)   0.1920 (  3.6%)   4.0160 (  4.2%)   4.1534 (  4.3%)  Combine redundant instructions
...
  Total Execution Time: 5.4880 seconds (5.5137 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.4040 (  8.3%)   0.0680 ( 10.9%)   0.4720 (  8.6%)   0.4675 (  8.5%)  Function Integration/Inlining
   0.3000 (  6.2%)   0.0720 ( 11.5%)   0.3720 (  6.8%)   0.3340 (  6.1%)  X86 DAG->DAG Instruction Selection
   0.2720 (  5.6%)   0.0320 (  5.1%)   0.3040 (  5.5%)   0.2744 (  5.0%)  Global Value Numbering
   0.2440 (  5.0%)   0.0120 (  1.9%)   0.2560 (  4.7%)   0.2738 (  5.0%)  Loop Invariant Code Motion
   0.2280 (  4.7%)   0.0120 (  1.9%)   0.2400 (  4.4%)   0.2527 (  4.6%)  Combine redundant instructions
...
Most inlined functions: Running `RUSTFLAGS="-C remark=inline -g" cargo +nightly build --release 2> inlined.txt`, and filtering the output using this: ``` grep '.*[[:alnum:]]\{5\} inlined into' inlined.txt | sed 's/.*[0-9]: //g; s/d into.*/d into/' | sort | uniq -c | sort -gr > unique_inlines.txt ``` (Yeah it's a bit of a monstrosity, and takes a long time, maybe I could look into making some tool analyze this if it turns out to be useful.) gives:
   1865 _ZN4core3ptr13drop_in_place17hffb7ed55089717f8E inlined into
    917 _ZN50_$LT$T$u20$as$u20$core..convert..From$LT$T$GT$$GT$4from17h99c56b7d2123957eE inlined into
    677 _ZN4core3fmt9Arguments6new_v117h0a32116e8ade096aE inlined into
    549 _ZN4core3ptr13drop_in_place17h8905708a4bea3261E inlined into
    517 _ZN4core3fmt9Formatter11debug_tuple17h312acfb009988478E inlined into
    484 _ZN50_$LT$T$u20$as$u20$core..convert..From$LT$T$GT$$GT$4from17hf6536c6dc214691bE inlined into
    438 _ZN5alloc9allocator6Layout4size17h278602cb6317171dE inlined into
    402 _ZN63_$LT$alloc..allocator..Layout$u20$as$u20$core..clone..Clone$GT$5clone17hba86f6db001903cdE inlined into
    381 _ZN61_$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$7dealloc17h3db9762b4fdfd21eE inlined into
    344 _ZN5alloc4heap15exchange_malloc17he9800f033a6330e4E inlined into
    341 _ZN4core3ptr13drop_in_place17hb5325e51b98a161aE inlined into
...
    82 _ZN4core3fmt9Formatter12debug_struct17h5cfb920e90696799E inlined into
...
rustc --version ``` rustc 1.21.0-nightly (f774bce 2017-08-12) ```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions