+++ date = "2018-10-11T14:00:00Z" title = "RISC-V LLVM Coding Lab at the LLVM Developers' Meeting 2018"
+++
Alex Bradbury is running a Coding Lab at the 2018 LLVM Developers' Meeting to complement his LLVM backend tutorial.
This coding lab will build on the material presented in the backend and guide you through some sample modifications to the RISC-V backend, including both codegen and MC layer (assembler/disassembler) modifications. Anyone familiar with C++ and a passing familiarity with LLVM IR should be able to get something out of this session, and you're able to go at your own pace.
This page will be updated with full instructions prior to the coding lab.
You will need:
- A laptop
- A debug build of a recent HEAD LLVM with the RISC-V backend enabled (see below). Incremental builds should be relatively fast, but if your laptop is underpowered you may want to plan to ssh out to a faster machine you own.
You will likely find it useful to refer to the slides from my tutorial presentation.
The canonical source for LLVM build instructions are the getting started guide or instructions on building LLVM with CMake. I outline my recommended options below.
When developing LLVM you really want a build with
debug info and assertions. This leads to huge binaries and a lot of
work for your linker. GNU ld tends to struggle in this case and it's likely
you'll encounter long build times and/or run out of memory if GNU ld is your
system linker (ld --version
reports "GNU ld"). Linkers such as GNU gold or
LLVM's lld do not have this problem. The following instructions will check out
and build LLVM using a system clang
and lld
, installed from your package
manager. If you're on Debian or Ubuntu, you may want to look at the packages
available at apt.llvm.org.
git clone http://llvm.org/git/llvm.git
cd llvm
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE="Debug" \
-DBUILD_SHARED_LIBS=True -DLLVM_USE_SPLIT_DWARF=True \
-DLLVM_BUILD_TESTS=True \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=True \
-DLLVM_TARGETS_TO_BUILD="all" \
-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="RISCV" ../
cmake --build .
The cmake options are chosen in order to allow a relatively fast incremental rebuild time.
All commands in this document assume you are in the build/
directory you
created and ran cmake from. All file name references are relative to the
project root. I recommend having one terminal in the project root and another
in the build directory, used just for building and running tests.
Firstly, check your just-built LLVM:
./bin/llc --version
Now let's run the RISC-V tests:
./bin/llvm-lit -s -i -v test/CodeGen/RISCV test/MC/RISCV
Next, inspect the isel debug output for a simple test (you may want to extract a single function to a separate file)
./bin/llc -mtriple=riscv32 -verify-machineinstrs \
< ../test/CodeGen/RISCV/alu32.ll -debug-only=isel
And inspect the records produced by tablegen (search for RISCV in the output):
./bin/llvm-tblgen -I ../lib/Target/RISCV/ -I ../include/ -I ../lib/Target/ \
../lib/Target/RISCV/RISCV.td | less
The basic approach is to first build any changes:
cmake --build .
Then re-run the tests:
./bin/llvm-lit -s -i test/CodeGen/RISCV test/MC/RISCV
If you used the build options I recommended, have a reasonable linker (GNU gold or lld) and a machine that's not completely underpowered most changes should have a fairly fast iteration time. e.g. 30s or less, depending on the file modified.
I strongly encourage you to make full use of git, committing changes when they make sense and reminding yourself what you changed with git diff.
The RISC-V backend implementation lives in lib/Target/RISCV
. You might want
to look at some of the files we mentioned in the tutorial. e.g.
RISCVInstrInfo.td, RISCVRegisterInfo.td.
You may also want to look in build/lib/Target/RISCV to see some of the
tablegenerated files (all named *.inc
).
Problem: For RV32, any 32-bit constant can be materialised in at most two
instructions: lui+addi. However, when compiling with the RVC (compressed)
instruction set, we want to select instructions that may have a 16-bit
compressed form. One improvement would be to select addi $reg, zero, -1
and
srli $reg, $reg, N
for any constant that is comprised of N
0s in the upper
bits and 32-N 1s in the lower bits.
What is materialisation? This just refers to the process of loading a constant into a register.
First of all, write a test for this in test/CodeGen/RISCV/newimm.ll
:
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV32I
define i32 @shiftedmask() nounwind {
ret i32 16777215
}
Unfortunately hex literals are reserved for floating point constants, so we have to use the decimal representation of 0xffffff.
You can directly check out output of this test by running:
./bin/llc -mtriple=riscv32 -verify-machineinstrs < \
../test/CodeGen/RISCV/newimm.ll
Now, lets generate check lines to capture the current lowering:
../utils/update_llc_test_checks.py --llc-binary=./bin/llc \
../test/CodeGen/RISCV/newimm.ll
If you reload newimm.ll you'll see that the test now tracks the generated instructions.
Next we want to try to improve lowering in this case. Although we want a general solution for immediates of this type, let's start with the simplest possible change and introduce a pattern in RISCVInstrInfo.td just for that immediate:
def mask24 : ImmLeaf<XLenVT, [{return Imm == 0xffffff;}]>;
def : Pat<(mask24), (SRLI (ADDI X0, -1), 8)>, Requires<[IsRV32]>;
Put this pattern just before the existing immediate patterns (search for /// Immediates
in the file). Note that we mark the pattern as valid only for the
32-bit RISC-V target, because a 40-bit shift would be need to produce the same
bit pattern on RV64.
Now run the test and see if it matched. If it didn't, remember you can pass
-debug-only=isel
to llc to look in more detail.
We've just defined the mask24
Immleaf
, which will match any immediate with
the value 0xffffff. The pattern defined on the next line expands that
immediate to the desired addi+srli pair. But now we want to extend this to
match any immediate with a bit pattern that means it can be materialised by
shifting -1, with the proviso that it should be profitable to do so. This
could be done solely within RISCVInstrInfo.td with the addition of:
- A new ImmLeaf to match immediates with that property. You'll need to change the predicate. Check include/llvm/Support/MathExtras.h for useful helper functions
- A new SDNodeXForm that produces a constant representing the number of bits to shift by (see current SDNodeXForm examples)
Or alternatively, a C++ selection code could be introduced in
RISCVDAGToDAGISel::Select in RISCVISelDAGToDAG.cpp. This patch shows a good
example of how to create MachineSDNode
s. https://reviews.llvm.org/D52962
Lets modify the existing ImmLeaf. As the instruction selector will prefer
shorter output patterns, we don't need to explicitly guard against the case
where it would be cheaper to just generate ADDI
(e.g. for immediates like
2047 or 1). As it happens, isMask_64
checks the exact property we want. You
might want to add some new tests for mask-like constants to newimm.ll (e.g.
268435455), then change the predicate and re-run the test file to see if it
matches. The generated code will be incorrect (as a shift of 8 is always
selected), but we'll address that now.
We need to define a new SDNodeXForm
which will take the immediate and
produce the appropriate shift width for use in our output pattern.
countLeadingZeros
from MathExtras.h
will be useful for this. You should now
have:
def LeadingZeros32 : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(countLeadingZeros<uint32_t>(N->getZExtValue()),
SDLoc(N), N->getValueType(0));
}]>;
def mask : ImmLeaf<XLenVT, [{return isMask_64(Imm);}]>;
Finally, lets replace the previous pattern. I've marked the output pattern as
TODO for you to fill out. See the nearby immediate patterns for examples of
using an SDNodeXForm
.
def : Pat<(mask:$imm), (TODO)>, Requires<[IsRV32]>;
Extension task: match an and
with a mask of this type directly to slli
and
srli
, thus avoiding the need for a register. Although saving a register, the
downside of this codegen choice is that it may result in more instructions in
the case that the same mask is used multiple times.
Aim: add a new instruction to the RISC-V LLVM backend. First introduce it to the MC layer (i.e. add assembler support). Then add codegen support.
To simplify things, we will have this instruction enabled by default rather than requiring a specific extension to be enabled.
First, we'll add a test. Create test/MC/RISCV/bitrev.s:
# RUN: llvm-mc %s -triple=riscv32 -riscv-no-aliases -show-encoding \
# RUN: | FileCheck -check-prefixes=CHECK-ASM,CHECK-ASM-AND-OBJ %s
# RUN: llvm-mc -filetype=obj -triple=riscv32 < %s \
# RUN: | llvm-objdump -riscv-no-aliases -d -r - \
# RUN: | FileCheck -check-prefixes=CHECK-OBJ,CHECK-ASM-AND-OBJ %s
# CHECK-ASM-AND-OBJ: bitrev a0, a1
bitrev a0, a1
You can directly execute this test with:
./bin/llvm-mc /local/scratch/asb58/llvm-repos/llvm/test/MC/RISCV/bitrev.s -triple=riscv32 -show-encoding
Open up lib/Target/RISCV/RISCVInstrInfo.td and study the instruction definitions. You want to define a new unary instruction for bitreverse.
Lets create a new instruction:
let hasSideEffects = 0, mayLoad = 0, mayStore = 0, rs2 = 0 in
def BITREV
: RVInstR<0b1111111, 0b111, OPC_OP, (outs GPR:$rd), (ins GPR:$rs1),
"bitrev", "$rd, $rs1">;
Now check that the bitrev.s passes, e.g. ./bin/llvm-lit -v test/MC/RISCV/bitrev.s
.
Next, we want to support codegen. Start by defining a test in test/CodeGen/RISCV/bitrev.ll
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV32I
declare i32 @llvm.bitreverse.i32(i32)
define i32 @bitrev(i32 %a) {
%1 = call i32 @llvm.bitreverse.i32(i32 %a)
ret i32 %1
}
If you run this through llc, you'll see it generates a huge amount of code right now.
The first step to enabling codegen is to mark the ISD::BITREVERSE
SelectionDAG opcode as legal. Do this in the RISCVTargetLowering constructor
in RISCVISelLowering.cpp:
setOperationAction(ISD::BITREVERSE, XLenVT, Legal);
You can check the definitions of SelectionDAG pattern fragments such as
bitreverse
in include/llvm/Target/TargetSelectionDAG.td
and of
SelectionDAG opcodes in include/llvm/CodeGen/ISDOpcodes.h
.
Now define a pattern mapping the bitreverse
SelectionDAG node to the
BITREV
instruction and re-run the test.
Extension: dream up further instructions and add support for them, perhaps adding new reg-immediate instructions.
Consider the following somewhat contrived example which aligns an input value:
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV32I
define i32 @aligner(i16 zeroext %a) nounwind {
; RV32I-LABEL: aligner:
; RV32I: # %bb.0:
; RV32I-NEXT: lui a1, 32
; RV32I-NEXT: addi a1, a1, -16
; RV32I-NEXT: addi a0, a0, 15
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: ret
%1 = zext i16 %a to i32
%2 = add i32 %1, 15
%3 = and i32 %2, -16
ret i32 %3
}
This example is contrived, but similar code sequences can occur for the 64-bit RISC-V target.
Note that the generated code is suboptimal. Why isn't a simple andi a0, a0, -16
selected?
Let's inspect the ISel debug output to investigate:
./bin/llc -mtriple=riscv32 < ../test/CodeGen/RISCV/aligner.ll -debug-only=isel
Look at the DAG just prior to the start of instruction selection:
Optimized legalized selection DAG: %bb.0 'aligner:'
SelectionDAG has 12 nodes:
t0: ch = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t4: i32 = AssertZext t2, ValueType:ch:i16
t8: i32 = add t4, Constant:i32<15>
t15: i32 = and t8, Constant:i32<131056>
t12: ch,glue = CopyToReg t0, Register:i32 $x10, t15
t13: ch = RISCVISD::RET_FLAG t12, Register:i32 $x10, t12:1
We can see that during SelectionDAG lowering, the constant operand to and
was mutated from -16
(which can be used in ANDI
) to 131056 (0x1FFF0 in
hex), which does not look correct. Looking at the SelectionDAG dumped after each stage, we
can see that the constant is mutated during the first DAG combine.
Running llc
with -debug-only=dagcombine
we can see when this happens:
Combining: t10: i32 = and t8, Constant:i32<-16>
Replacing.2 t10: i32 = and t8, Constant:i32<-16>
With: t15: i32 = and t8, Constant:i32<131056>
You can open up lib/CodeGen/SelectionDAG/DAGCombiner.cpp
and inspect
DAGCombiner::visitAND
to see all the combines. In order to narrow down
exactly which one is causing the problem, it may be easiest to get a stack
trace from the point the mutated constant is created. Either use your
favourite debugger to add a breakpoint in SelectionDAG::getConstant
in
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
or insert
sys::PrintStackTrace(llvm::errs())
(being sure to include
include/llvm/Support/Signals.h
). This shows that the culprit is
SimplifyDemandedBits
called from visitAND
at line 4671 (in my checkout).
The problem is that SimplifyDemandedBits
is recognising that not all bits of
the mask are actually needed (the upper 16 bits are known-zero before the
addition and the addition can only affect a small number of bits), so it's creating
a constant without those upper bits set. This isn't a beneficial
transformation for this input because the new mask no longer fits into an
immediate.
Now we've identified the source of the problem, what can we do to fix it? We
could either try to modify the target-independent DAG combiner to recognise
this case in somehow, or undo the combine in the backend. We'll elect to
handle this in our backend. We're going to do this by writing some C++
instruction selection logic. As well as demonstrating the process of
investigating when the input to instruction selection isn't what's expected,
this task also demonstrates another common backend development approach:
learning from other backends. X86 actually handles this case in
X86DAGToDAGISel::shrinkAndImmediate
.
Open up lib/Target/RISCV/RISCVISelDAGToDAG.cpp
. We're going to edit the
switch statement in RISCVDagToDAGISel::Select
. We need to recognise an
ISD::ADD
SelectionDAG node where:
- It has a constant operand
- The constant operand doesn't fit in the 12-bit immediate field of
ANDI
- That constant operand has N leading zero bits
- Setting those N leading bits to 1 would result in a signed 12-bit immediate
- The most significant N bits of the first input operand are known to be zero
This is quite fiddly, so I've provided the majority of the logic for you:
case ISD::AND: {
SDValue Op0 = Node->getOperand(0);
auto *Op1C = dyn_cast<ConstantSDNode>(Node->getOperand(1));
// Check we have a constant operand.
if (!Op1C)
break;
APInt MaskVal = Op1C->getAPIntValue();
// Check the mask currently doesn't fit in ANDI immediate.
if (MaskVal.getMinSignedBits() <= 12)
break;
unsigned MaskLZ = MaskVal.countLeadingZeros();
APInt HighOnes = APInt::getHighBitsSet(MaskVal.getBitWidth(), MaskLZ);
APInt NewMaskVal = MaskVal | HighOnes;
// If we were able to set the N upper zero bits to ones, would the new
// mask fit in the ANDI immediate?
if (NewMaskVal.getMinSignedBits() > 12)
break;
// Are the upper N bits of the first operand known to be zero?
if (!CurDAG->MaskedValueIsZero(Op0, HighOnes))
break;
llvm_unreachable("Replace me with code to select the new ANDI!");
return;
}
Note that we use LLVM's arbitrary precision integer representation. The
"magic" happens in the call to MaskedValueIsZero
. This in turn calls the
computeKnownBits
helper, which is very important for a number of
optimisations.
I encourage you to step through the above logic carefully, then re-run llc
with aligner.ll
as its input to verify that the llvm_unreachable
is hit
(i.e. the preceding logic identified a candidate for this transformation).
You'll now need to create a TargetConstant for the new mask, and call
CurDAG->ReplaceNode
with the selected MachineSDNode
. See my tutorial
slides for an example of this, or look for examples elsewhere in
RISCVISelDAGToDAG.cpp
.
With your change in place, you should see the andi a0, a0, -16
generated
from aligner.ll
.
Warning: this transformation was not heavily tested. It's quite possible you can spot a bug. Please let me know if so.