Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview about existing solutions and approaches #2

Open
denzp opened this issue Nov 11, 2018 · 9 comments
Open

Overview about existing solutions and approaches #2

denzp opened this issue Nov 11, 2018 · 9 comments

Comments

@denzp
Copy link
Member

denzp commented Nov 11, 2018

Let's collect information about existing solutions, their architecture, and wear or strong sides. Hopefully, this will help get an overview about the current state and next steps that have to be taken to improve CUDA experience. Also, we will be able to define crucial components that can be shared between the different approaches.

I'm going to post here an overview of ptx-linker and ptx-builder approach in next days.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 11, 2018

Intrinsics wise, pretty much every framework uses link_llvm_intrinsics to import the cuda intrinsics. @japaric nvptx exposes these, but what most people don't seem to be aware of is that the CUDA intrinsics are available on nightly (as long as nothing breaks) under core::arch::nvptx::*. These sometime break because we currently are not testing them, so that's one of the first things I'd like to solve.

@termoshtt
Copy link
Member

I've develop rust-accel/nvptx crate to build/link/compile rust crate into a PTX asm based on a accel-nvptx toolchain with my rust fork in order to

But I found it is too large work to handle by myself 😰

@denzp
Copy link
Member Author

denzp commented Nov 13, 2018

Some details about current ptx-builder stack which consists of several projects:

  • ptx-linker - the core linker that composes together several build device crates into a PTX assembly. It works directly with Rust own libllvm which I think is good because end users don't need to worry about consistency between system LLVM and Rust LLVM versions. Also, it does some "adjustments" to the modules, like LTO and fixing symbols names (indeed, it does this at a later stage than accel)
  • ptx-builder a build.rs helper that technically allow to have single-source CUDA crates (practically, it sometimes just making the code much uglier).
  • ptx-support an auxilary crate with nice to have stuff: cuda_printf macro, and a proper error handler. It also exposes high-level wrappers for core::arch::nvptx::*.

@denzp
Copy link
Member Author

denzp commented Nov 13, 2018

I think it would be nice if both accel and ptx-builder could leverage ptx-support crate.
The desired API should be still discussed because the Context is currently just a draft.

Also, in the long term, I'd be happy if we somehow merge ptx-linker and rust-accel/nvptx since they have very similar use cases, even though approaches are a bit different.

@termoshtt
Copy link
Member

I'd be happy if we somehow merge ptx-linker and rust-accel/nvptx since they have very similar use cases

I totally agree. I'd like to offload build/link part of accel. My motivation to create rust-accel/nvptx is to link libcore into PTX crate for core::slice, but I have not completed it.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 15, 2018

So what are the pro's and cons of resolving linking in rustc vs the ptx-linker approach ? What does wasm do here?

@termoshtt
Copy link
Member

So what are the pro's and cons of resolving linking in rustc vs the ptx-linker approach ?

Focusing on the linking issue, there are two point we need to consider

  • Link in LLVM bitcode (IR) or in PTX asm (or cubin)
    • bitcodes are linked by llvm-link (i.e. without CUDA toolchain)
    • PTX asm are linked by nvcc (or other CUDA toolchain)
  • Who (rustc or ptx-linker) links them.

The lowering process (LLVM bitcode -> PTX using llc) will obey the choise.

Link in LLVM bitcode

Pros

  • Can be linked in LLVM toolchain (without CUDA toolchain)
  • Global optimization pass for linked LLVM bitcode

Cons

  • Large difference from existing workflow e.g. for x86
    • We need lowering phase after link phase

Link in PTX

Pros

  • Similar to existing workflow

Cons

  • Linkers supported by rustc (i.e. ld and lld) cannot link PTX
  • needs CUDA toolchain

@denzp
Copy link
Member Author

denzp commented Jan 8, 2019

@termoshtt How do you think PTX linking can be implemented? We somehow need to emit the assembly with --emit asm for each crate. I see it either should be a cargo support or a custom rustc wrapper.
The good point of ptx-linker is that it integrates seamlessly into existing implementation.

Also, we can always implement "alternative linker" that has the same CLI but does the job with the help CUDA toolchain (lower input bitcode into the PTX assembly per crate and then link them).

@termoshtt
Copy link
Member

The good point of ptx-linker is that it integrates seamlessly into existing implementation.

Yes, this is a large merit, and we should use it until rustc gets its functionality.

We somehow need to emit the assembly with --emit asm for each crate.

The problem of xargo-like approach is that rustc skips the actual link phase. We should fix rustc to link compiled PTX into single PTX (or cubin) using "alternative linker" which based on CUDA toolchain. Since rustc already depends on external linkers (ld and linker.exe) on the system, it is not a critical problem. I thinks ptx-linker can be an "alternative linker" with a little modification, and had started to hacking rustc rust-accel/rust#7 (but I got little progress :<)

IMO, this linking issue is unavoidable for focusing on making nvptx as tier-2 target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants