Skip to content

Commit 7fd1ba5

Browse files
committed
[libclc][amdgpu] Implement native_exp via builtin
This came up during a discussion on #129679, which has been split out as a preparatory commit. An example of the AMDGPU codegen is: define <2 x float> @_Z10native_expDv2_f(<2 x float> %x) { %0 = extractelement <2 x float> %x, i64 0 %mul.i4 = fmul afn float %0, 0x3FF7154760000000 %1 = tail call afn float @llvm.amdgcn.exp2.f32(float %mul.i4) %vecinit = insertelement <2 x float> poison, float %1, i64 0 %2 = extractelement <2 x float> %x, i64 1 %mul.i = fmul afn float %2, 0x3FF7154760000000 %3 = tail call afn float @llvm.amdgcn.exp2.f32(float %mul.i) %vecinit2 = insertelement <2 x float> %vecinit, float %3, i64 1 ret <2 x float> %vecinit2 }
1 parent b52977b commit 7fd1ba5

File tree

2 files changed

+6
-14
lines changed

2 files changed

+6
-14
lines changed

libclc/amdgpu/lib/math/native_exp.cl

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@
77
//===----------------------------------------------------------------------===//
88

99
#include <clc/clc.h>
10+
#include <clc/clcmacro.h>
1011

11-
#define __CLC_BODY <native_exp.inc>
12-
#define __FLOAT_ONLY
13-
#include <clc/math/gentype.inc>
12+
_CLC_OVERLOAD _CLC_DEF float native_exp(float val) {
13+
return __builtin_amdgcn_exp2f(val * M_LOG2E_F);
14+
}
15+
16+
_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, native_exp, float)

libclc/amdgpu/lib/math/native_exp.inc

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)