Open
Description
This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert freeze
instructions between cmp
and conditional br
instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the freeze
instructions into places where they interfere with optimizations like FMA formation.
With this test case I get this ISA including a v_fma_f32
instruction:
$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main: ; @main
; %bb.0: ; %bb
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_fma_f32 v0, v0, v1, 1.0
v_cmp_lt_f32_e32 vcc_lo, 0, v0
v_cndmask_b32_e64 v0, 0, 1, vcc_lo
s_setpc_b64 s[30:31]
But after running it through InstCombine, I get separate v_mul_f32
and v_add_f32
instructions:
$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main: ; @main
; %bb.0: ; %bb
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_mul_f32_e32 v0, v0, v1
v_add_f32_e32 v0, 1.0, v0
v_cmp_lt_f32_e32 vcc_lo, 0, v0
v_cndmask_b32_e64 v0, 0, 1, vcc_lo
s_setpc_b64 s[30:31]