Skip to content

[libclc][amdgpu] Implement native_exp2 via AMD builtin #133696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 31, 2025

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Mar 31, 2025

This came up during a discussion on #129679, which has been split out as a preparatory commit.

An example of the AMDGPU codegen is:

define <2 x float> @_Z10native_expDv2_f(<2 x float> %val) {
  %mul = fmul afn <2 x float> %val, splat (float 0x3FF7154760000000)
  %0 = extractelement <2 x float> %mul, i64 0
  %1 = tail call float @llvm.amdgcn.exp2.f32(float %0)
  %vecinit.i = insertelement <2 x float> poison, float %1, i64 0
  %2 = extractelement <2 x float> %mul, i64 1
  %3 = tail call float @llvm.amdgcn.exp2.f32(float %2)
  %vecinit2.i = insertelement <2 x float> %vecinit.i, float %3, i64 1
  ret <2 x float> %vecinit2.i
}

define <2 x float> @_Z11native_exp2Dv2_f(<2 x float> %x) {
  %0 = extractelement <2 x float> %x, i64 0
  %1 = tail call float @llvm.amdgcn.exp2.f32(float %0)
  %vecinit = insertelement <2 x float> poison, float %1, i64 0
  %2 = extractelement <2 x float> %x, i64 1
  %3 = tail call float @llvm.amdgcn.exp2.f32(float %2)
  %vecinit2 = insertelement <2 x float> %vecinit, float %3, i64 1
  ret <2 x float> %vecinit2
}

@frasercrmck frasercrmck added the libclc libclc OpenCL library label Mar 31, 2025
@frasercrmck frasercrmck requested a review from arsenm March 31, 2025 10:43
@frasercrmck
Copy link
Contributor Author

Just realised I missed the brief - native_exp2 should use the builtin, native_exp should continue to call native_exp2.

This came up during a discussion on llvm#129679, which has been split out as
a preparatory commit.

An example of the AMDGPU codegen is:

    define <2 x float> @_Z10native_expDv2_f(<2 x float> %val) {
    entry:
      %mul = fmul afn <2 x float> %val, splat (float 0x3FF7154760000000)
      %0 = extractelement <2 x float> %mul, i64 0
      %1 = tail call float @llvm.amdgcn.exp2.f32(float %0)
      %vecinit.i = insertelement <2 x float> poison, float %1, i64 0
      %2 = extractelement <2 x float> %mul, i64 1
      %3 = tail call float @llvm.amdgcn.exp2.f32(float %2)
      %vecinit2.i = insertelement <2 x float> %vecinit.i, float %3, i64 1
      ret <2 x float> %vecinit2.i
    }

    define <2 x float> @_Z11native_exp2Dv2_f(<2 x float> %x) {
    entry:
      %0 = extractelement <2 x float> %x, i64 0
      %1 = tail call float @llvm.amdgcn.exp2.f32(float %0)
      %vecinit = insertelement <2 x float> poison, float %1, i64 0
      %2 = extractelement <2 x float> %x, i64 1
      %3 = tail call float @llvm.amdgcn.exp2.f32(float %2)
      %vecinit2 = insertelement <2 x float> %vecinit, float %3, i64 1
      ret <2 x float> %vecinit2
    }
@frasercrmck frasercrmck force-pushed the libclc-amdgcn-native-exp2f branch from 7fd1ba5 to b927766 Compare March 31, 2025 10:51
@frasercrmck frasercrmck changed the title [libclc][amdgpu] Implement native_exp via builtin [libclc][amdgpu] Implement native_exp via AMD builtin Mar 31, 2025
@frasercrmck frasercrmck changed the title [libclc][amdgpu] Implement native_exp via AMD builtin [libclc][amdgpu] Implement native_exp2 via AMD builtin Mar 31, 2025
@frasercrmck
Copy link
Contributor Author

Just realised I missed the brief - native_exp2 should use the builtin, native_exp should continue to call native_exp2.

Done

@frasercrmck frasercrmck merged commit 3fd0eaa into llvm:main Mar 31, 2025
11 checks passed
@frasercrmck frasercrmck deleted the libclc-amdgcn-native-exp2f branch March 31, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants