Description
This issue tracks progress/roadmap for what needs to be done to codegen for targets like AMDGPUs. Personally, I am working on AMDGPU codegen as it would be used for HSA. Specifically, I am aiming for the amdgcn-amd-amdhsa-amdgiz
LLVM target. Note that I’m still learning, so this issue will likely change as guided by experience.
Here are the pieces that will be needed to make this work to a MVP level (ie not providing access to most GPU specific stuff):
- Initialize the LLVM target machine Initialize LLVM's AMDGPU target machine, if available. #51548
- Teach the LLVM codegen backend to be mindful of target machine imposed address spaces PR: Make
librustc_codegen_llvm
aware of LLVM address spaces. #51576. E.g. allocas are in address space 5 for the target triple I mentioned above. - Add the
amdgpu-kernel
ABI (PR Add theamdgpu-kernel
ABI. #52032). - Add a mechanism to delegate virtual function calls (meaning call by pointer value) to runtime libraries.
- Required metadata ??
The address space changes are pretty general. However, in order to not require sweeping changes to how Rust is codegen-ed for LLVM, any target must support a flat address space. Flat meaning an addr space which is a superset of all others.
amdgpu-kernel
requires its return type be void
. There are two ways I see to do this:
- compile-time checks (somewhere in
rustc
), ie disallow any return type except!
and()
. - rewriting returns to use an
sret
-like style: promote the return to be an indirect first argument of the function.
As I recall, Rust inserts wrapper functions for functions with extern “abi”
which call the real rust abi function. My current impl went with the magical rewriting, but I think forcing the user to acknowledge this with an error is better long term.
Privately, I've made it to errors stemming from # 4 on general Rust code (ie std
/core
code). See this repo/crate. Regarding virtual function calls, in principle, it’s possible to support, if using HSA, completely GPU side. amdgpu-kernel
s have access to two different hsa_queue_t
s (one for the host and the device), setup by the GPU’s hardware command processor. When a virtual call is encountered, the trick is to have the GPU write to its own hsa_queue_t
then wait on the completion signal. Foreign functions can also be supported in this way, by writing to the host hsa_queue_t
instead.
Post-MVP
TBD(TODO) Discuss?