Open
Description
Here's a suggestion for an update to the tracking issue to include concerns. Partially copied for japaric's original post and added concerns from and links to relevant issues.
If you have the possibility you should take a look @RDambrosio016
Feature gate #![feature(abi_ptx)]
This ABI is intended to be used when generating code for device (GPU) targets like nvptx64-nvidia-cuda
. It is used to generate kernels ("global functions") that work as an entry point from host (cpu) code. Functions that do not use the "ptx-kernel" ABI are "device functions" and only callable from kernels and device functions. Device functions are specifically not usable from host (cpu) code.
Public API
The following code
#![no_std]
#![feature(abi_ptx)]
#[no_mangle]
pub extern "ptx-kernel" fn foo() {}
Produces
.version 3.2
.target sm_30
.address_size 64
// .globl foo
.visible .entry foo()
{
ret;
}
Steps / History
- Fix broken passing of kernel arguments (Fix codegen bug in "ptx-kernel" abi related to arg passing #94703)
- Replace
PassMode::Direct
with something else (nvptx "ptx-kernel" ABI (feature: abi_ptx) uses PassMode::Direct for Aggregates #117271) - Re-enable ptx CI tests to avoid future breakage (Re-enable nvptx tests #96842)
- Emit error for kernels with return value other than
()
- Emit error if a kernel is called directly.
- Fix the problem where Rust generates types the LLVM PTX cannot select (NVPTX: "LLVM ERROR: Cannot select" when returning struct with 3byte size from "device function" #97174)
- Resolve unresolved questions
- Create an RFC that specifies the safe way to use this abi (I assume this will be required @pnkfelix?)
- Document feature (https://doc.rust-lang.org/reference/items/external-blocks.html#abi)
- Stabilization PR
Unresolved Questions
- Resolve what kind of stability guarantees can be made about the generated ptx.
- The ABI of kernels have been previously changed for a major version bump and the ptx-interoperability doc is still outdated.
- PTX is an ISA with many versions. The newest is major version 7. Do we need to reserve the possibility of breaking things when moving to a new major version?
- Figure out what llvm does in relations to the
nvptx64-nvidia-cuda
target and the__global__
modifier.
- What kind of types should be allowed to use as arguments in kernels. Should it be a hard error to use these types or only a warning (Global and device kernels are unsound rust-cuda/wg#11)
- The most important part is to find a minimal but useful subset of Rust types that can be used in kernels. raw pointers, primitive types and
#[repr(C)]
types seems like a good start (no slices, tuples, references, etc). - Using mutable references is almost certain UB except for a few unusable special cases (spawning a single thread only)
- There are many convenient types in Rust which do not have a stable ABI (
&[T]
,(T, U)
, etc). Are there some types that do not have a stable representation but can be relied on having an identical representation for sequential compilation with a given rustc version? If so are there any way we could pass them safely between host and device code compiled with the same rustc version?
- The most important part is to find a minimal but useful subset of Rust types that can be used in kernels. raw pointers, primitive types and
- This unstable feature is one of the last stoppers to using
nvptx64-nvidia-cuda
on stable Rust. The target seems to still have a few bugs (NVPTX backend metabug #38789). Should this feature be kept unstable to avoid usage ofnvptx64-nvidia-cuda
until it has been verified to be usable. - How should shared be supported? Is it necessary to do that from the go?
Notes
- It is not possible to emulate kernels with
#[naked]
functions as the.entry
directive needs to be emited for nvptx kernels.
Metadata
Metadata
Assignees
Labels
Blocker: Implemented in the nightly compiler and unstable.Category: An issue tracking the progress of sth. like the implementation of an RFCTarget: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.htmlStatus: There are blocking design concerns.Status: It's hard to tell what's been done and what hasn't! Someone should do some investigation.Relevant to the compiler team, which will review and decide on the PR/issue.