Closed
Description
I came across this LLVM IR that the AArch64 backend refuses to lower ATM:
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
define bfloat @kernel_sum_reduce(bfloat %0, bfloat %1) {
%3 = fadd bfloat %0, %1
ret bfloat %3
}
To reproduce (skipping the triple as testing on an AArch64 host):
$ llc --mattr=+bf16 bfloat_add.ll
LLVM ERROR: Cannot select: t5: bf16 = fadd t2, t4
t2: bf16,ch = CopyFromReg t0, Register:bf16 %0
t1: bf16 = Register %0
t4: bf16,ch = CopyFromReg t0, Register:bf16 %1
t3: bf16 = Register %1
Tested using ToT: 82c820b95cf7. This is not a problem for the X86 backend. I haven't checked other backends.
Short analysis
There is no fadd
for bfloat
s (aka bf16
) on AArch64: A64 -- SIMD and Floating-point Instructions. The backend could choose to transform bfloat
s to float
s, but currently it does not.
IIUC, Clang wouldn't produce this code to begin with. I have extracted it from MLIR's sparse_sum_bf16.mlir (CC @d0k ), which is failing for me on AArch64 (and this looks like the root cause).