Description
Bugzilla Link | 37358 |
Resolution | FIXED |
Resolved on | Jan 16, 2019 13:21 |
Version | trunk |
OS | Windows NT |
Blocks | #38454 |
Attachments | failing IR |
CC | @chandlerc,@topperc,@davidbolvansky,@echristo,@efriedma-quic,@gnzlbg,@zmodem,@hfinkel,@cuviper,@RKSimon,@nikic,@rotateright,@tstellar |
Fixed by commit(s) | r351296 |
Extended Description
We've got an upstream bug in rust-lang/rust at rust-lang/rust#50154 where LLVM at opt-level=3 is mis-optimizing promotion of an argument passed by reference to pass-by-value. The attached IR exhibits the difference by looking at:
$ opt -O2 tmp.ll -S | grep '^define.*m256'
define internal fastcc void @​_mm256_cmpgt_epi16(<4 x i64>* nocapture, <4 x i64>* nocapture readonly %a, <4 x i64>* nocapture readonly %b) unnamed_addr #​2 {
$ opt -O3 tmp.ll -S | grep '^define.*m256'
define internal fastcc void @​_mm256_cmpgt_epi16(<4 x i64>* nocapture, <4 x i64> %a.val, <4 x i64> %b.val) unnamed_addr #​2 {
Note that at opt-level=2 the two arguments to this function continue to be passed by reference, but at opt-level=3 they're promoted to being passed by value. In this situation the target function, _mm256_cmpgt_epi16
, has the "avx2" feature enabled. The caller, baseline
, does not have any extra target features enabled (aka doesn't have "avx2" available). This means that if attempting to pass by value this'll be an ABI mismatch at codegen time, producing invalid results on optimized IR.
Using opt-bisect-limit I found that this happens during the "Promote 'by reference' arguments to scalars on SCC" pass. Are we correct in thinking that this optimization shouldn't happen? Or is this a valid optimization that we'll need to work around on rustc's end?