|  | 
|  | 1 | +# Backend Agnostic Codegen | 
|  | 2 | + | 
|  | 3 | +In the future, it would be nice to allow other codegen backends (e.g. | 
|  | 4 | +[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an | 
|  | 5 | +abstract interface for all backends to implenent. | 
|  | 6 | + | 
|  | 7 | +> The following is a copy/paste of a README from the rust-lang/rust repo. | 
|  | 8 | +> Please submit a PR if it needs updating. | 
|  | 9 | +
 | 
|  | 10 | +# Refactoring of `rustc_codegen_llvm` | 
|  | 11 | +by Denis Merigoux, October 23rd 2018 | 
|  | 12 | + | 
|  | 13 | +## State of the code before the refactoring | 
|  | 14 | + | 
|  | 15 | +All the code related to the compilation of MIR into LLVM IR was contained | 
|  | 16 | +inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most | 
|  | 17 | +important elements: | 
|  | 18 | +* the `back` folder (7,800 LOC) implements the mechanisms for creating the | 
|  | 19 | +  different object files and archive through LLVM, but also the communication | 
|  | 20 | +  mechanisms for parallel code generation; | 
|  | 21 | +* the `debuginfo` (3,200 LOC) folder contains all code that passes debug | 
|  | 22 | +  information down to LLVM; | 
|  | 23 | +* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with | 
|  | 24 | +  LLVM using the C++ API; | 
|  | 25 | +* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM | 
|  | 26 | +  IR; | 
|  | 27 | +* the `base.rs` (1,300 LOC) file contains some helper functions but also the | 
|  | 28 | +  high-level code that launches the code generation and distributes the work. | 
|  | 29 | +* the `builder.rs` (1,200 LOC) file contains all the functions generating | 
|  | 30 | +  individual LLVM IR instructions inside a basic block; | 
|  | 31 | +* the `common.rs` (450 LOC) contains various helper functions and all the | 
|  | 32 | +  functions generating LLVM static values; | 
|  | 33 | +* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR. | 
|  | 34 | + | 
|  | 35 | +The goal of this refactoring is to separate inside this crate code that is | 
|  | 36 | +specific to the LLVM from code that can be reused for other rustc backends. For | 
|  | 37 | +instance, the `mir` folder is almost entirely backend-specific but it relies | 
|  | 38 | +heavily on other parts of the crate. The separation of the code must not affect | 
|  | 39 | +the logic of the code nor its performance. | 
|  | 40 | + | 
|  | 41 | +For these reasons, the separation process involves two transformations that | 
|  | 42 | +have to be done at the same time for the resulting code to compile : | 
|  | 43 | + | 
|  | 44 | +1. replace all the LLVM-specific types by generics inside function signatures | 
|  | 45 | +   and structure definitions; | 
|  | 46 | +2. encapsulate all functions calling the LLVM FFI inside a set of traits that | 
|  | 47 | +   will define the interface between backend-agnostic code and the backend. | 
|  | 48 | + | 
|  | 49 | +While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new | 
|  | 50 | +traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name | 
|  | 51 | +suggestion by @eddyb). | 
|  | 52 | + | 
|  | 53 | +## Generic types and structures | 
|  | 54 | + | 
|  | 55 | +@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a | 
|  | 56 | +generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This | 
|  | 57 | +work has been extended to all structures inside the `mir` folder and elsewhere, | 
|  | 58 | +as well as for LLVM's `BasicBlock` and `Type` types. | 
|  | 59 | + | 
|  | 60 | +The two most important structures for the LLVM codegen are `CodegenCx` and | 
|  | 61 | +`Builder`. They are parametrized by multiple lifetime parameters and the type | 
|  | 62 | +for `Value`. | 
|  | 63 | + | 
|  | 64 | +```rust,ignore | 
|  | 65 | +struct CodegenCx<'ll, 'tcx> { | 
|  | 66 | +  /* ... */ | 
|  | 67 | +} | 
|  | 68 | +
 | 
|  | 69 | +struct Builder<'a, 'll, 'tcx> { | 
|  | 70 | +  cx: &'a CodegenCx<'ll, 'tcx>, | 
|  | 71 | +  /* ... */ | 
|  | 72 | +} | 
|  | 73 | +``` | 
|  | 74 | + | 
|  | 75 | +`CodegenCx` is used to compile one codegen-unit that can contain multiple | 
|  | 76 | +functions, whereas `Builder` is created to compile one basic block. | 
|  | 77 | + | 
|  | 78 | +The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime | 
|  | 79 | +parameters, that correspond to the following: | 
|  | 80 | +* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt` | 
|  | 81 | +  containing the program's information; | 
|  | 82 | +* `'a` is a short-lived reference of a `CodegenCx` or another object inside a | 
|  | 83 | +  struct; | 
|  | 84 | +* `'ll` is the lifetime of references to LLVM objects such as `Value` or | 
|  | 85 | +  `Type`. | 
|  | 86 | + | 
|  | 87 | +Although there are already many lifetime parameters in the code, making it | 
|  | 88 | +generic uncovered situations where the borrow-checker was passing only due to | 
|  | 89 | +the special nature of the LLVM objects manipulated (they are extern pointers). | 
|  | 90 | +For instance, a additional lifetime parameter had to be added to | 
|  | 91 | +`LocalAnalyser` in `analyse.rs`, leading to the definition: | 
|  | 92 | + | 
|  | 93 | +```rust,ignore | 
|  | 94 | +struct LocalAnalyzer<'mir, 'a, 'tcx> { | 
|  | 95 | +  /* ... */ | 
|  | 96 | +} | 
|  | 97 | +``` | 
|  | 98 | + | 
|  | 99 | +However, the two most important structures `CodegenCx` and `Builder` are not | 
|  | 100 | +defined in the backend-agnostic code. Indeed, their content is highly specific | 
|  | 101 | +of the backend and it makes more sense to leave their definition to the backend | 
|  | 102 | +implementor than to allow just a narrow spot via a generic field for the | 
|  | 103 | +backend's context. | 
|  | 104 | + | 
|  | 105 | +## Traits and interface | 
|  | 106 | + | 
|  | 107 | +Because they have to be defined by the backend, `CodegenCx` and `Builder` will | 
|  | 108 | +be the structures implementing all the traits defining the backend's interface. | 
|  | 109 | +These traits are defined in the folder `rustc_codegen_ssa/traits` and all the | 
|  | 110 | +backend-agnostic code is parametrized by them. For instance, let us explain how | 
|  | 111 | +a function in `base.rs` is parametrized: | 
|  | 112 | + | 
|  | 113 | +```rust,ignore | 
|  | 114 | +pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>( | 
|  | 115 | +    cx: &'a Bx::CodegenCx, | 
|  | 116 | +    instance: Instance<'tcx> | 
|  | 117 | +) { | 
|  | 118 | +    /* ... */ | 
|  | 119 | +} | 
|  | 120 | +``` | 
|  | 121 | + | 
|  | 122 | +In this signature, we have the two lifetime parameters explained earlier and | 
|  | 123 | +the master type `Bx` which satisfies the trait `BuilderMethods` corresponding | 
|  | 124 | +to the interface satisfied by the `Builder` struct. The `BuilderMethods` | 
|  | 125 | +defines an associated type `Bx::CodegenCx` that itself satisfies the | 
|  | 126 | +`CodegenMethods` traits implemented by the struct `CodegenCx`. | 
|  | 127 | + | 
|  | 128 | +On the trait side, here is an example with part of the definition of | 
|  | 129 | +`BuilderMethods` in `traits/builder.rs`: | 
|  | 130 | + | 
|  | 131 | +```rust,ignore | 
|  | 132 | +pub trait BuilderMethods<'a, 'tcx>: | 
|  | 133 | +    HasCodegen<'tcx> | 
|  | 134 | +    + DebugInfoBuilderMethods<'tcx> | 
|  | 135 | +    + ArgTypeMethods<'tcx> | 
|  | 136 | +    + AbiBuilderMethods<'tcx> | 
|  | 137 | +    + IntrinsicCallMethods<'tcx> | 
|  | 138 | +    + AsmBuilderMethods<'tcx> | 
|  | 139 | +{ | 
|  | 140 | +    fn new_block<'b>( | 
|  | 141 | +        cx: &'a Self::CodegenCx, | 
|  | 142 | +        llfn: Self::Function, | 
|  | 143 | +        name: &'b str | 
|  | 144 | +    ) -> Self; | 
|  | 145 | +    /* ... */ | 
|  | 146 | +    fn cond_br( | 
|  | 147 | +        &mut self, | 
|  | 148 | +        cond: Self::Value, | 
|  | 149 | +        then_llbb: Self::BasicBlock, | 
|  | 150 | +        else_llbb: Self::BasicBlock, | 
|  | 151 | +    ); | 
|  | 152 | +    /* ... */ | 
|  | 153 | +} | 
|  | 154 | +``` | 
|  | 155 | + | 
|  | 156 | +Finally, a master structure implementing the `ExtraBackendMethods` trait is | 
|  | 157 | +used for high-level codegen-driving functions like `codegen_crate` in | 
|  | 158 | +`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`. | 
|  | 159 | +`ExtraBackendMethods` should be implemented by the same structure that | 
|  | 160 | +implements the `CodegenBackend` defined in | 
|  | 161 | +`rustc_codegen_utils/codegen_backend.rs`. | 
|  | 162 | + | 
|  | 163 | +During the traitification process, certain functions have been converted from | 
|  | 164 | +methods of a local structure to methods of `CodegenCx` or `Builder` and a | 
|  | 165 | +corresponding `self` parameter has been added. Indeed, LLVM stores information | 
|  | 166 | +internally that it can access when called through its API. This information | 
|  | 167 | +does not show up in a Rust data structure carried around when these methods are | 
|  | 168 | +called. However, when implementing a Rust backend for `rustc`, these methods | 
|  | 169 | +will need information from `CodegenCx`, hence the additional parameter (unused | 
|  | 170 | +in the LLVM implementation of the trait). | 
|  | 171 | + | 
|  | 172 | +## State of the code after the refactoring | 
|  | 173 | + | 
|  | 174 | +The traits offer an API which is very similar to the API of LLVM. This is not | 
|  | 175 | +the best solution since LLVM has a very special way of doing things: when | 
|  | 176 | +addding another backend, the traits definition might be changed in order to | 
|  | 177 | +offer more flexibility. | 
|  | 178 | + | 
|  | 179 | +However, the current separation between backend-agnostic and LLVM-specific code | 
|  | 180 | +has allows the reuse of a significant part of the old `rustc_codegen_llvm`. | 
|  | 181 | +Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the | 
|  | 182 | +most important elements: | 
|  | 183 | + | 
|  | 184 | +* `back` folder: 3,800 (BA) vs 4,100 (LLVM); | 
|  | 185 | +* `mir` folder: 4,400 (BA) vs 0 (LLVM); | 
|  | 186 | +* `base.rs`: 1,100 (BA) vs 250 (LLVM); | 
|  | 187 | +* `builder.rs`: 1,400 (BA) vs 0 (LLVM); | 
|  | 188 | +* `common.rs`: 350 (BA) vs 350 (LLVM); | 
|  | 189 | + | 
|  | 190 | +The `debuginfo` folder has been left almost untouched by the splitting and is | 
|  | 191 | +specific to LLVM. Only its high-level features have been traitified. | 
|  | 192 | + | 
|  | 193 | +The new `traits` folder has 1500 LOC only for trait definitions. Overall, the | 
|  | 194 | +27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new | 
|  | 195 | +18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized | 
|  | 196 | +`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of | 
|  | 197 | +approximately 10,000 LOC that would otherwise have had to be duplicated between | 
|  | 198 | +the multiple backends of `rustc`. | 
|  | 199 | + | 
|  | 200 | +The refactored version of `rustc`'s backend introduced no regression over the | 
|  | 201 | +test suite nor in performance benchmark, which is in coherence with the nature | 
|  | 202 | +of the refactoring that used only compile-time parametricity (no trait | 
|  | 203 | +objects). | 
0 commit comments