Description
Please consider this minimal LLVM IR:
define half @f(half %x) {
%canonicalized = call half @llvm.canonicalize.f16(half %x)
ret half %canonicalized
}
Run with llc -mtriple=amdgcn
and we get:
f: ; @f
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
The canonicalize
operation has been entirely optimised away.
The reason for this is we get during ISel:
t0: ch,glue = EntryToken
t2: f32,ch = CopyFromReg # D:1 t0, Register:f32 %0
t4: f16 = fp_round # D:1 t2, TargetConstant:i64<1>
t5: f16 = fcanonicalize # D:1 t4
t6: f32 = fp_extend # D:1 t5
t8: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t6
t9: ch = RET_GLUE # D:1 t8, Register:f32 $vgpr0, t8:1
Here, fcanonicalize
is optimised away because SITargetLowering::isCanonicalized
determines that fp_round
is guaranteed to return an already-canonicalised result, so no work is needed, but that then leaves us with fp_extend (fp_round x, /*strict=*/1)
which is optimised to a no-op.
This prevents another optimisation from going in (#80520) which makes this problem show up in more cases than it currently does, and sadly I struggle to find a good way of ensuring we get correct code for this case without also making codegen for other tests worse.
@llvm/pr-subscribers-backend-amdgpu