Skip to content

AMDGPU: Wrong code for fcanonicalize #82937

Closed
@hvdijk

Description

@hvdijk

Please consider this minimal LLVM IR:

define half @f(half %x) {
  %canonicalized = call half @llvm.canonicalize.f16(half %x)
  ret half %canonicalized
}

Run with llc -mtriple=amdgcn and we get:

f:                                      ; @f
        s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
        s_setpc_b64 s[30:31]

The canonicalize operation has been entirely optimised away.

The reason for this is we get during ISel:

  t0: ch,glue = EntryToken
          t2: f32,ch = CopyFromReg # D:1 t0, Register:f32 %0
        t4: f16 = fp_round # D:1 t2, TargetConstant:i64<1>
      t5: f16 = fcanonicalize # D:1 t4
    t6: f32 = fp_extend # D:1 t5
  t8: ch,glue = CopyToReg # D:1 t0, Register:f32 $vgpr0, t6
  t9: ch = RET_GLUE # D:1 t8, Register:f32 $vgpr0, t8:1

Here, fcanonicalize is optimised away because SITargetLowering::isCanonicalized determines that fp_round is guaranteed to return an already-canonicalised result, so no work is needed, but that then leaves us with fp_extend (fp_round x, /*strict=*/1) which is optimised to a no-op.

This prevents another optimisation from going in (#80520) which makes this problem show up in more cases than it currently does, and sadly I struggle to find a good way of ensuring we get correct code for this case without also making codegen for other tests worse.

@llvm/pr-subscribers-backend-amdgpu

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions