Skip to content

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation #141622

@jayfoad

Description

@jayfoad

This is a code quality issue that has been affecting some graphics workloads recently. The LLPC frontend tends to insert freeze instructions between cmp and conditional br instructions, to avoid undefined behavior if the condition is undef or poison. Then InstCombine moves the freeze instructions into places where they interfere with optimizations like FMA formation.

With this test case I get this ISA including a v_fma_f32 instruction:

$ llc -mtriple=amdgcn -mcpu=gfx1010 r.txt -o -
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_fma_f32 v0, v0, v1, 1.0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

But after running it through InstCombine, I get separate v_mul_f32 and v_add_f32 instructions:

$ opt -passes=instcombine r.txt -o - | llc -mtriple=amdgcn -mcpu=gfx1010
...
main:                                   ; @main
; %bb.0:                                ; %bb
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mul_f32_e32 v0, v0, v1
	v_add_f32_e32 v0, 1.0, v0
	v_cmp_lt_f32_e32 vcc_lo, 0, v0
	v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	s_setpc_b64 s[30:31]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions