Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/doc/rustc-dev-guide/rust-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
425a9c0a0e365c0b8c6cfd00c2ded83a73bed9a0
a1dbb443527bd126452875eb5d5860c1d001d761
1 change: 1 addition & 0 deletions src/doc/rustc-dev-guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@
- [The `rustdoc-json` test suite](./rustdoc-internals/rustdoc-json-test-suite.md)
- [GPU offload internals](./offload/internals.md)
- [Installation](./offload/installation.md)
- [Usage](./offload/usage.md)
- [Autodiff internals](./autodiff/internals.md)
- [Installation](./autodiff/installation.md)
- [How to debug](./autodiff/debugging.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Note that this documentation mostly covers user-facing information. See

### Overview

- Stage 0: the pre-compiled compiler
- Stage 0: the pre-compiled compiler and standard library
- Stage 1: from current code, by an earlier compiler
- Stage 2: the truly current compiler
- Stage 3: the same-result test
Expand Down Expand Up @@ -192,7 +192,7 @@ include, but are not limited to:
artifacts'). If you're working on the standard library, this is normally the
test command you want.
- `./x build --stage 0` means to build with the stage0 `rustc`.
- `./x doc --stage 0` means to document using the stage0 `rustdoc`.
- `./x doc --stage 1` means to document using the stage0 `rustdoc`.

#### Examples of what *not* to do

Expand All @@ -211,7 +211,7 @@ include, but are not limited to:
In short, _stage 0 uses the `stage0` compiler to create `stage0` artifacts which
will later be uplifted to be the stage1 compiler_.

In each stage, two major steps are performed:
In each stage besides 0, two major steps are performed:

1. `std` is compiled by the stage N compiler.
2. That `std` is linked to programs built by the stage N compiler, including the
Expand Down
Binary file modified src/doc/rustc-dev-guide/src/img/coverage-branch-counting-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/dataflow-graphviz-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/github-cli.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/github-whitespace-changes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/llvm-cov-show-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/other-peoples-commits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/rustbot-submodules.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/submodule-conflicts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/wpa-initial-memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/doc/rustc-dev-guide/src/img/wpa-stack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 21 additions & 35 deletions src/doc/rustc-dev-guide/src/macro-expansion.md
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,9 @@ We use these items in macro parser:
are about to ask the MBE parser to parse. We will consume the raw stream of
tokens and output a binding of metavariables to corresponding token trees.
The parsing session can be used to report parser errors.
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
the token stream against. They're converted from token trees before matching.
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match the token stream
against. They're converted from the original token trees in the macro's definition before
matching.

[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html

Expand All @@ -544,41 +545,26 @@ The full interface is defined [here][code_parse_int].
The macro parser does pretty much exactly the same as a normal regex parser
with one exception: in order to parse different types of metavariables, such as
`ident`, `block`, `expr`, etc., the macro parser must call back to the normal
Rust parser. Both the definition and invocation of macros are parsed using
the parser in a process which is non-intuitively self-referential.

The code to parse macro _definitions_ is in
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
pattern for matching a macro definition as `$( $lhs:tt => $rhs:tt );+`. In
other words, a `macro_rules` definition should have in its body at least one
occurrence of a token tree followed by `=>` followed by another token tree.
When the compiler comes to a `macro_rules` definition, it uses this pattern to
match the two token trees per the rules of the definition of the macro, _thereby
utilizing the macro parser itself_. In our example definition, the
metavariable `$lhs` would match the patterns of both arms: `(print
$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using
a NFA-based macro parser described above. However, the matcher variable
used is the first token tree (`$lhs`) extracted from the arms of the macro
_definition_. Using our example, we would try to match the token stream `print
foo` from the invocation against the matchers `print $mvar:ident` and `print
twice $mvar:ident` that we previously extracted from the definition. The
algorithm is exactly the same, but when the macro parser comes to a place in the
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an `ident` token,
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax
Rust parser.

The code to parse macro definitions is in [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr].
For more information about the macro parser's implementation, see the comments in
[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].

Using our example, we would try to match the token stream `print foo` from the invocation against
the matchers `print $mvar:ident` and `print twice $mvar:ident` that we previously extracted from the
rules in the macro definition. When the macro parser comes to a place in the current matcher where
it needs to match a _non-terminal_ (e.g. `$mvar:ident`), it calls back to the normal Rust parser to
get the contents of that non-terminal. In this case, the Rust parser would look for an `ident`
token, which it finds (`foo`) and returns to the macro parser. Then, the macro parser continues
parsing.

Note that exactly one of the matchers from the various rules should match the invocation; if there is
more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax
error.

For more information about the macro parser's implementation, see the comments
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
Assuming exactly one rule matches, macro expansion will then *transcribe* the right-hand side of the
rule, substituting the values of any matches it captured when matching against the left-hand side.

## Procedural Macros

Expand Down
29 changes: 1 addition & 28 deletions src/doc/rustc-dev-guide/src/offload/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Installation

In the future, `std::offload` should become available in nightly builds for users. For now, everyone still needs to build rustc from source.
`std::offload` is partly available in nightly builds for users. For now, everyone however still needs to build rustc from source to use all features of it.

## Build instructions

Expand Down Expand Up @@ -42,30 +42,3 @@ run
```
./x test --stage 1 tests/codegen-llvm/gpu_offload
```

## Usage
It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible.
```
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc --edition=2024 --crate-type cdylib src/main.rs --emit=llvm-ir -O -C lto=fat -Cpanic=abort -Zoffload=Enable
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g -O3 main.ll -o main -save-temps
LIBOMPTARGET_INFO=-1 ./main
```
The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu.
The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here, unless
you use features like build-std, which are not covered by this guide. Look at the codegen test to get a feeling for how to write a working example.
In the last step you can run your binary, if all went well you will see a data transfer being reported:
```
omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments:
omptarget device 0 info: tofrom(unknown)[1024]
omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown
omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0:
omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration
omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024 1 0 unknown at unknown:0:0
// some other output
omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments:
omptarget device 0 info: tofrom(unknown)[1024]
omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown
omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown
```
112 changes: 112 additions & 0 deletions src/doc/rustc-dev-guide/src/offload/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Usage

This feature is work-in-progress, and not ready for usage. The instructions here are for contributors, or people interested in following the latest progress.
We currently work on launching the following Rust kernel on the GPU. To follow along, copy it to a `src/lib.rs` file.

```rust
#![feature(abi_gpu_kernel)]
#![no_std]

#[cfg(target_os = "linux")]
extern crate libc;
#[cfg(target_os = "linux")]
use libc::c_char;

use core::mem;

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
loop {}
}

#[cfg(target_os = "linux")]
#[unsafe(no_mangle)]
#[inline(never)]
fn main() {
let array_c: *mut [f64; 256] =
unsafe { libc::calloc(256, (mem::size_of::<f64>()) as libc::size_t) as *mut [f64; 256] };
let output = c"The first element is zero %f\n";
let output2 = c"The first element is NOT zero %f\n";
let output3 = c"The second element is %f\n";
unsafe {
let val: *const c_char = if (*array_c)[0] < 0.1 {
output.as_ptr()
} else {
output2.as_ptr()
};
libc::printf(val, (*array_c)[0]);
}

unsafe {
kernel_1(array_c);
}
core::hint::black_box(&array_c);
unsafe {
let val: *const c_char = if (*array_c)[0] < 0.1 {
output.as_ptr()
} else {
output2.as_ptr()
};
libc::printf(val, (*array_c)[0]);
libc::printf(output3.as_ptr(), (*array_c)[1]);
}
}

#[cfg(target_os = "linux")]
unsafe extern "C" {
pub fn kernel_1(array_b: *mut [f64; 256]);
}
```

## Compile instructions
It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible. So either substitute clang/lld invocations below with absolute path, or set your `PATH` accordingly.

First we generate the host (cpu) code. The first build is just to compile libc, take note of the hashed path. Then we call rustc directly to build our host code, while providing the libc artifact to rustc.
```
cargo +offload build -r -v
rustc +offload --edition 2024 src/lib.rs -g --crate-type cdylib -C opt-level=3 -C panic=abort -C lto=fat -L dependency=/absolute_path_to/target/release/deps --extern libc=/absolute_path_to/target/release/deps/liblibc-<HASH>.rlib --emit=llvm-bc,llvm-ir -Zoffload=Enable -Zunstable-options
```

Now we generate the device code. Replace the target-cpu with the right code for your gpu.
```
RUSTFLAGS="-Ctarget-cpu=gfx90a --emit=llvm-bc,llvm-ir" cargo +offload build -Zunstable-options -r -v --target amdgcn-amd-amdhsa -Zbuild-std=core
```

Now find the <libname>.ll under target/amdgcn-amd-amdhsa folder and copy it to a device.ll file (or adjust the file names below).
If you work on an NVIDIA or Intel gpu, please adjust the names acordingly and open an issue to share your results (either if you succeed or fail).
First we compile our .ll files (good for manual inspections) to .bc files and clean up leftover artifacts. The cleanup is important, otherwise caching might interfere on following runs.
```
opt lib.ll -o lib.bc
opt device.ll -o device.bc
rm *.o
rm bare.amdgcn.gfx90a.img*
```

```
clang-offload-packager" "-o" "host.out" "--image=file=device.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"

clang-21" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-S" "-save-temps=cwd" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "lib.rs" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-resource-dir" "/<ABSOLUTE_PATH_TO>/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21" "-ferror-limit" "19" "-fopenmp" "-fopenmp-offload-mandatory" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf" "-fembed-offload-object=host.out" "-fopenmp-targets=amdgcn-amd-amdhsa" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "host.s" "-x" "ir" "lib.bc"

clang-21" "-cc1as" "-triple" "x86_64-unknown-linux-gnu" "-filetype" "obj" "-main-file-name" "lib.rs" "-target-cpu" "x86-64" "-mrelocation-model" "pic" "-o" "host.o" "host.s"

clang-linker-wrapper" "--should-extract=gfx90a" "--device-compiler=amdgcn-amd-amdhsa=-g" "--device-compiler=amdgcn-amd-amdhsa=-save-temps=cwd" "--device-linker=amdgcn-amd-amdhsa=-lompdevice" "--host-triple=x86_64-unknown-linux-gnu" "--save-temps" "--linker-path=/ABSOlUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/lld/bin/ld.lld" "--hash-style=gnu" "--eh-frame-hdr" "-m" "elf_x86_64" "-pie" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "bare" "/lib/../lib64/Scrt1.o" "/lib/../lib64/crti.o" "/ABSOLUTE_PATH_TO/crtbeginS.o" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/bin/../lib/x86_64-unknown-linux-gnu" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21/lib/x86_64-unknown-linux-gnu" "-L/lib/../lib64" "-L/usr/lib64" "-L/lib" "-L/usr/lib" "host.o" "-lstdc++" "-lm" "-lomp" "-lomptarget" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib" "-lgcc_s" "-lgcc" "-lpthread" "-lc" "-lgcc_s" "-lgcc" "/ABSOLUTE_PATH_TO/crtendS.o" "/lib/../lib64/crtn.o"
```

Especially for the last command I recommend to not fix the paths, but rather just re-generate them by copying a bare-mode openmp example and compiling it with your clang. By adding `-###` to your clang invocation, you can see the invidual steps.
```
myclang++ -fuse-ld=lld -O3 -fopenmp -fopenmp-offload-mandatory --offload-arch=gfx90a omp_bare.cpp -o main -###
```

In the final step, you can now run your binary

```
./main
The first element is zero 0.000000
The first element is NOT zero 21.000000
The second element is 0.000000
```

To receive more information about the memory transfer, you can enable info printing with
```
LIBOMPTARGET_INFO=-1 ./main
```
Binary file modified src/doc/rustc-dev-guide/src/queries/example-0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading