Description
I'm filling this here for tracking even though this is entirely an LLVM issue.
Consider this program:
use std::env;
fn main() {
if env::args().any(|s| s == "Hello, world!") {
println!("Said hello!");
}
}
The closure here takes a String
argument. Looking at the DWARF for this closure:
< 7><0x00001c02 GOFF=0x00001c02> DW_TAG_inlined_subroutine
DW_AT_abstract_origin 0x00001b5a<.debug_info+0x00001b5a>
DW_AT_low_pc 0x13d94
DW_AT_high_pc <offset-from-lowpc>78
DW_AT_call_file 0x0000000f /rustc/636d7ff91b9847d6d43c7bbe023568828f6e3246/library/core/src/iter/traits/iterator.rs
DW_AT_call_line 0x00000af7
DW_AT_call_column 0x00000014
< 8><0x00001c17 GOFF=0x00001c17> DW_TAG_formal_parameter
DW_AT_location <loclist at .debug_loc+0x000006c8>
[ 0]<base-address 0x13d30>
[ 1]<offset-pair 0x64, 0xb0> [0x13d94, 0x13de0]DW_OP_reg4 DW_OP_piece 8 DW_OP_reg5 DW_OP_piece 8
DW_AT_abstract_origin 0x00001b71<.debug_info+0x00001b71>
we can see that only 16 bytes of the 24 byte String
are accounted for in the DW_AT_location
for s
. If we build with -Cllvm-args="--disable-peephole"
we instead get.
< 7><0x00001c02 GOFF=0x00001c02> DW_TAG_inlined_subroutine
DW_AT_abstract_origin 0x00001b5a<.debug_info+0x00001b5a>
DW_AT_low_pc 0x13d99
DW_AT_high_pc <offset-from-lowpc>73
DW_AT_call_file 0x0000000f /rustc/636d7ff91b9847d6d43c7bbe023568828f6e3246/library/core/src/iter/traits/iterator.rs
DW_AT_call_line 0x00000af7
DW_AT_call_column 0x00000014
< 8><0x00001c17 GOFF=0x00001c17> DW_TAG_formal_parameter
DW_AT_location <loclist at .debug_loc+0x00000716>
[ 0]<base-address 0x13d30>
[ 1]<offset-pair 0x69, 0x6d> [0x13d99, 0x13d9d]DW_OP_reg4 DW_OP_piece 8 DW_OP_reg5 DW_OP_piece 8 DW_OP_reg0 DW_OP_piece 8
[ 2]<offset-pair 0x6d, 0xb0> [0x13d9d, 0x13de0]DW_OP_reg4 DW_OP_piece 8 DW_OP_reg5 DW_OP_piece 8
DW_AT_abstract_origin 0x00001b71<.debug_info+0x00001b71>
We can see here that (at least at the beginning of the closure) all 24 bytes of the String
are accounted for. It's clear that some form of SROA has split the String
into its constituent components to fit in registers. It's instructive to look at the assembly for the --disable-peephole
version first. Annotations mine.
13d4a: 4c 8d 74 24 30 lea 0x30(%rsp),%r14 # `String` outparam address
[snip]
13d79: 4c 89 f7 mov %r14,%rdi # Outparam address is moved to the first argument slot
[snip]
13d7f: 41 ff d4 call *%r12 # Call a function that will allocate the `String`
13d82: 48 8b 74 24 30 mov 0x30(%rsp),%rsi # Move `String`'s `RawVec`'s cap into %rsi
13d87: 48 89 f0 mov %rsi,%rax
13d8a: 48 f7 d8 neg %rax
13d8d: 70 53 jo 13de2 <_ZN3tmp4main17h1b868ff7e3686bfaE+0xb2>
13d8f: 48 8b 7c 24 38 mov 0x38(%rsp),%rdi # Move `String`'s `RawVec`'s ptr into %rdi
13d94: 48 8b 44 24 40 mov 0x40(%rsp),%rax # Move `String's` len into %rax
13d99: 48 83 e8 0d sub $0xd,%rax # Subtract the length of "Hello, world!" from %rax
13d9d: 75 31 jne 13dd0 <_ZN3tmp4main17h50c4fe792b52af98E+0xa0> # If `String` isn't the same length as "Hello, world!", jump to the deallocation.
[snip partially inlined string comparison]
13dc0: 75 16 jne 13dd8 <_ZN3tmp4main17h50c4fe792b52af98E+0xa8> # Jump to deallocation
13dc2: eb ac jmp 13d70 <_ZN3tmp4main17h50c4fe792b52af98E+0x40> # Compare strings for real
13dc4: 66 66 66 2e 0f 1f 84 data16 data16 cs nopw 0x0(%rax,%rax,1)
13dcb: 00 00 00 00 00
13dd0: 45 31 ed xor %r13d,%r13d
13dd3: 48 85 f6 test %rsi,%rsi
13dd6: 74 98 je 13d70 <_ZN3tmp4main17h50c4fe792b52af98E+0x40>
13dd8: ba 01 00 00 00 mov $0x1,%edx
13ddd: 41 ff d7 call *%r15 # Free the string buffer.
13de0: eb 8e jmp 13d70 <_ZN3tmp4main17h50c4fe792b52af98E+0x40>
With LLVM's peephole optimization pass enabled, the assembly is almost the same, except that the compiler fuses the instructions at 13d94/13d99
. That "load to a register and compare" sequence (the sub
is used solely to set ZF and branch, the actual result is thrown away) is replaced by a direct cmpq $0xd,0x40(%rsp)
. The peephole pass doesn't make any effort to fix up the debug info though, and instead the 8 bytes of String
's len
are just lost.
@rustbot label: +A-debuginfo