Skip to content

Conversation

@koparasy
Copy link
Contributor

@koparasy koparasy commented Nov 4, 2025

No description provided.

@koparasy koparasy changed the title Initial lowering to LLVM-IR for device code [CIR][HIP] Lower Device CIR to LLVM IR Nov 4, 2025
@koparasy
Copy link
Contributor Author

koparasy commented Nov 5, 2025

@bcardosolopes this code interleaves alloca's with address space casts. Is this allowed? Or should I add the allocas at the function entry point?

@@ -0,0 +1,72 @@
//===- AMDGPU.cpp - TargetInfo for AMDGPU
//-----------------------------------===//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80-cols

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed

@@ -0,0 +1,19 @@
#include "../Inputs/cuda.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass -I ../Inputs in the invocation below and just use #include "cuda.h"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,19 @@
#include "../Inputs/cuda.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment from above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

j = i;
}

// CIR: cir.global "private" internal dso_local addrspace(offload_local) @_ZZ2fnvE1j : !s32i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add LLVM checks for lowering here, and add OGCG for tracking classic codegen alongside (see clang/test/CIR/CodeGen/CUDA/cuda-builtin-vars.cu as an example). Same for the other tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@bcardosolopes
Copy link
Member

@bcardosolopes this code interleaves alloca's with address space casts. Is this allowed? Or should I add the allocas at the function entry point?

The allocas should already be at the function entry BB, are you seeing anything different? How does the final LLVM IR looks like? We should try to emit it to be the most similar as possible, but intermingling them within the function entry BB doesn't seem too problematic.

@koparasy
Copy link
Contributor Author

koparasy commented Nov 7, 2025

@bcardosolopes this code interleaves alloca's with address space casts. Is this allowed? Or should I add the allocas at the function entry point?

The allocas should already be at the function entry BB, are you seeing anything different? How does the final LLVM IR looks like? We should try to emit it to be the most similar as possible, but intermingling them within the function entry BB doesn't seem too problematic.

Here is the device code generated for a device function through clangir:

define dso_local void @_Z9device_fnPidf(ptr %0, double %1, float %2) #0 {
  %4 = alloca ptr, i64 1, align 8, addrspace(5)
  %5 = addrspacecast ptr addrspace(5) %4 to ptr
  %6 = alloca double, i64 1, align 8, addrspace(5)
  %7 = addrspacecast ptr addrspace(5) %6 to ptr
  %8 = alloca float, i64 1, align 4, addrspace(5)
  %9 = addrspacecast ptr addrspace(5) %8 to ptr
  store ptr %0, ptr %5, align 8
  store double %1, ptr %7, align 8
  store float %2, ptr %9, align 4
  ret void
}

Here is OG:

define dso_local void @_Z9device_fnPidf(ptr noundef %a, double noundef %b, float noundef %c) #0 {
entry:
  %a.addr = alloca ptr, align 8, addrspace(5)
  %b.addr = alloca double, align 8, addrspace(5)
  %c.addr = alloca float, align 4, addrspace(5)
  %a.addr.ascast = addrspacecast ptr addrspace(5) %a.addr to ptr
  %b.addr.ascast = addrspacecast ptr addrspace(5) %b.addr to ptr
  %c.addr.ascast = addrspacecast ptr addrspace(5) %c.addr to ptr
  store ptr %a, ptr %a.addr.ascast, align 8
  store double %b, ptr %b.addr.ascast, align 8
  store float %c, ptr %c.addr.ascast, align 4
  ret void
}

Both codes as far as I can tell are equivalent and should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants