Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wazevo(docs): optimizing compiler #2065

Merged
merged 8 commits into from
Mar 9, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
+++
title = "How the Optimizing Compiler Works"
layout = "single"
+++

wazero supports two modes of execution: interpreter mode and compilation mode.
The interpreter mode is a fallback mode for platforms where compilation is not
supported. Compilation mode is otherwise the default mode of execution: it
translates Wasm modules to native code to get the best run-time performance.

Translating Wasm bytecode into machine code can take multiple forms. wazero
1.0 performs a straightforward translation from a given instruction to a native
instruction. wazero 2.0 introduces an optimizing compiler that is able to
perform nontrivial optimizing transformations, such as constant folding or
dead-code elimination, and it makes better use of the underlying hardware, such
as CPU registers. This document digs deeper into what we mean when we say
"optimizing compiler", and explains how it is implemented in wazero.

This document is intended for maintainers, researchers, developers and in
general anyone interested in understanding the internals of wazero.

What is an Optimizing Compiler?
-------------------------------

Wazero supports an _optimizing_ compiler in the style of other optimizing
compilers such as LLVM's or V8's. Traditionally an optimizing
compiler performs compilation in a number of steps.

Compare this to the **old compiler**, where compilation happens in one step or
two, depending on how you count:


```goat
Input +---------------+ +---------------+
Wasm Binary ---->| DecodeModule |---->| CompileModule |----> wazero IR
+---------------+ +---------------+
```

That is, the module is (1) validated then (2) translated to an Intermediate
Representation (IR). The wazero IR can then be executed directly (in the case
of the interpreter) or it can be further processed and translated into native
code by the compiler. This compiler performs a straightforward translation from
the IR to native code, without any further passes. The wazero IR is not intended
for further processing beyond immediate execution or straightforward
translation.

```goat
+---- wazero IR ----+
| |
v v
+--------------+ +--------------+
| Compiler | | Interpreter |- - - executable
+--------------+ +--------------+
|
+----------+---------+
| |
v v
+---------+ +---------+
| ARM64 | | AMD64 |
| Backend | | Backend | - - - - - - - - - executable
+---------+ +---------+
```


Validation and translation to an IR in a compiler are usually called the
**front-end** part of a compiler, while code-generation occurs in what we call
the **back-end** of a compiler. The front-end is the part of a compiler that is
closer to the input, and it generally indicates machine-independent processing,
such as parsing and static validation. The back-end is the part of a compiler
that is closer to the output, and it generally includes machine-specific
procedures, such as code-generation.

In the **optimizing** compiler, we still decode and translate Wasm binaries to
an intermediate representation in the front-end, but we use a textbook
representation called an **SSA** or "Static Single-Assignment Form", that is
intended for further transformation.

The benefit of choosing an IR that is meant for transformation is that a lot of
optimization passes can apply directly to the IR, and thus be
machine-independent. Then the back-end can be relatively simpler, in that it
will only have to deal with machine-specific concerns.

The wazero optimizing compiler implements the following compilation passes:

* Front-End:
- Translation to SSA
- Optimization
- Block Layout

evacchi marked this conversation as resolved.
Show resolved Hide resolved
* Back-End:
- Instruction Selection
- Registry Allocation
- Finalization and Encoding

```goat
Input +-------------------+ +-------------------+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
+-------------------+ +-------------------+ |
+----------------------------------------------------------+
|
| +---------------+ +---------------+
+->| Front-End |----------->| Back-End |
+---------------+ +---------------+
| |
v v
SSA Instruction Selection
| |
v v
Optimization Registry Allocation
| |
v v
Block Layout Finalization/Encoding
```

Like the other engines, the implementation can be found under `engine`, specifically
in the `wazevo` sub-package. The entry-point is found under `internal/engine/wazevo/engine.go`,
where the implementation of the interface `wasm.Engine` is found.

All the passes can be dumped to the console for debugging, by enabling, the build-time
flags under `internal/engine/wazevo/wazevoapi/debug_options.go`. The flags are disabled
by default and should only be enabled during debugging. These may also change in the future.

In the following we will assume all paths to be relative to the `internal/engine/wazevo`,
so we will omit the prefix.

## Index

- [Front-End](frontend/)
- [Back-End](backend/)
- [Appendix](appendix/)
198 changes: 198 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/appendix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
+++
title = "Appendix: Trampolines"
layout = "single"
+++

Trampolines are used to interface between the Go runtime and the generated
code, in two cases:

- when we need to **enter the generated code** from the Go runtime.
- when we need to **leave the generated code** to invoke a host function
(written in Go).

In this section we want to complete the picture of how a Wasm function gets
translated from Wasm to executable code in the optimizing compiler, by
describing how to jump into the execution of the generated code at run-time.

## Entering the Generated Code

At run-time, user space invokes a Wasm function through the public
`api.Function` interface, using methods `Call()` or `CallWithStack()`. The
implementation of this method, in turn, eventually invokes an ASM
**trampoline**. The signature of this trampoline in Go code is:

```go
func entrypoint(
preambleExecutable, functionExecutable *byte,
executionContextPtr uintptr, moduleContextPtr *byte,
paramResultStackPtr *uint64,
goAllocatedStackSlicePtr uintptr)
```

- `preambleExecutable` is a pointer to the generated code for the preamble (see
below)
- `functionExecutable` is a pointer to the generated code for the function (as
described in the previous sections).
- `executionContextPtr` is a raw pointer to the `wazevo.executionContext`
struct. This struct is used to save the state of the Go runtime before
entering or leaving the generated code. It also holds shared state between the
Go runtime and the generated code, such as the exit code that is used to
terminate execution on failure, or suspend it to invoke host functions.
- `moduleContextPtr` is a pointer to the `wazevo.moduleContextOpaque` struct.
This struct Its contents are basically the pointers to the module instance,
specific objects as well as functions. This is sometimes called "VMContext" in
other Wasm runtimes.
- `paramResultStackPtr` is a pointer to the slice where the arguments and
results of the function are passed.
- `goAllocatedStackSlicePtr` is an aligned pointer to the Go-allocated stack
for holding values and call frames. For further details refer to
[/internal/engine/compiler/engine.go][wazero-engine-stack]
mathetake marked this conversation as resolved.
Show resolved Hide resolved

The ASM trampoline is guaranteed to follow the stable calling convention
described in [Go's ASM documentation][abi-asm] (sometimes referred to as
[ABI0][proposal-register-cc]) The trampoline can be found in
`backend/isa/<arch>/abi_entry_<arch>.s`.
evacchi marked this conversation as resolved.
Show resolved Hide resolved

For each given architecture, the trampoline:
- moves the arguments to some conventional registers that are documented to be
free at the time of the call,
evacchi marked this conversation as resolved.
Show resolved Hide resolved
- finally, it jumps into the execution of the generated code for the preamble

The **preamble** is generated distinctly from the rest of the function, and
before it.
evacchi marked this conversation as resolved.
Show resolved Hide resolved

This is implemented in `machine.CompileEntryPreamble(*ssa.Signature)`. The
procedure first instantiates a `backend.FunctionABI` struct with metadata about
the expected ABI for a function with a given signature.
evacchi marked this conversation as resolved.
Show resolved Hide resolved

The preamble sets the fields in the `wazevo.executionContext`.

At the beginning of the preamble:

- We set a register to point to the `*wazevo.executionContext` struct.
- we save the stack pointers, frame pointers, return addresses, etc. to that
struct.
- we update the stack pointer to point to `paramResultStackPtr`.
evacchi marked this conversation as resolved.
Show resolved Hide resolved

The generated code works in concert with the assumption that the preamble has
been entered through the aforementioned trampoline. Thus, it assumes that the
arguments can be found in some specific registers.

The preamble then assigns the arguments pointed at by `paramResultStackPtr` to
the registers that the generated code expects.
evacchi marked this conversation as resolved.
Show resolved Hide resolved

Finally, it invokes the generated code for the function.

The epilogue reverses part of the process, finally returning control to the
caller of the `entrypoint()` function, and the Go runtime. The caller of
`entrypoint()` is also responsible for completing the cleaning up procedure by
invoking `afterGoFunctionCallEntrypoint()` (again, implemented in
backend-specific ASM). which will restore the stack pointers and return
control to the caller of the function.

The arch-specific code can be found in
`backend/isa/<arch>/abi_entry_preamble.go`.

[wazero-engine-stack]: https://github.com/tetratelabs/wazero/blob/095b49f74a5e36ce401b899a0c16de4eeb46c054/internal/engine/compiler/engine.go#L77-L132
[abi-arm64]: https://tip.golang.org/src/cmd/compile/abi-internal#arm64-architecture
[abi-amd64]: https://tip.golang.org/src/cmd/compile/abi-internal#amd64-architecture
[abi-cc]: https://tip.golang.org/src/cmd/compile/abi-internal#function-call-argument-and-result-passing


## Leaving the Generated Code

In "[How do compiler functions work?][how-do-compiler-functions-work]", we
already outlined how _leaving_ the generated code works with the help of a
function. We will complete here the picture by briefly describing the code that
is generated.

When the generated code needs to return control to the Go runtime, it inserts a
meta-instruction that is called `exitSequence` in both `amd64` and `arm64`
backends. This meta-instruction sets the `exitCode` in the
`wazevo.executionContext` struct, restore the stack pointers and then returns
control to the caller of the `entrypoint()` function described above.

As described in "[How do compiler functions
work?][how-do-compiler-functions-work]", the mechanism is essentially the same
when invoking a host function or raising an error. However, when a function is
invoked the `exitCode` also indicates the identifier of the host function to be
invoked.

The magic really happens in the `backend.Machine.CompileGoFunctionTrampoline()`
method. This method is actually invoked when host modules are being
instantiated. It generates a trampoline that is used to invoke such functions
from the generated code.

This trampoline implements essentially the same prologue as the `entrypoint()`,
but it also reserves space for the arguments and results of the function to be
invoked.

A host function has the signature:

```
go func(ctx context.Context, stack []uint64)
evacchi marked this conversation as resolved.
Show resolved Hide resolved
```

the function arguments in the `stack` parameter are copied over to the reserved
slots of the real stack. For instance, on `arm64` the stack layout would look
as follows (on `amd64` it would be similar):

```goat
(high address)
SP ------> +-----------------+ <----+
| ....... | |
| ret Y | |
| ....... | |
| ret 0 | |
| arg X | | size_of_arg_ret
| ....... | |
| arg 1 | |
| arg 0 | <----+ <-------- originalArg0Reg
| size_of_arg_ret |
| ReturnAddress |
+-----------------+ <----+
| xxxx | | ;; might be padded to make it 16-byte aligned.
+--->| arg[N]/ret[M] | |
sliceSize| | ............ | | goCallStackSize
| | arg[1]/ret[1] | |
+--->| arg[0]/ret[0] | <----+ <-------- arg0ret0AddrReg
| sliceSize |
| frame_size |
+-----------------+
(low address)
```

Finally, the trampoline jumps into the execution of the host function using the
`exitSequence` meta-instruction.

Upon return, the process is reversed.

## Code

- The trampoline to enter the generated function is implemented by the
`backend.Machine.CompileEntryPreamble()` method.
- The trampoline to return traps and invoke host functions is generated by
`backend.Machine.CompileGoFunctionTrampoline()` method.

You can find arch-specific implementations in
`backend/isa/<arch>/abi_go_call.go`,
`backend/isa/<arch>/abi_entry_preamble.go`, etc. The trampolines are found
under `backend/isa/<arch>/abi_entry_<arch>.s`.

## Further References

- Go's [internal ABI documentation][abi-internal] complements Go's ASM
documentation with details on the internal, unstable ABI, known as
*ABIInternal*. Notice that, however, the calling convention for ASM is
different and described in the ASM documentation.
evacchi marked this conversation as resolved.
Show resolved Hide resolved
- Go's [internal ASM documentation][abi-asm] describes the stable, stack-based
calling convention for ASM (_ABI0_).
evacchi marked this conversation as resolved.
Show resolved Hide resolved
- Raphael Poss's [The Go low-level calling convention on
x86-64][go-call-conv-x86] is also an excellent reference for `amd64`.

[abi-asm]: https://go.dev/doc/asm
evacchi marked this conversation as resolved.
Show resolved Hide resolved
[abi-internal]: https://tip.golang.org/src/cmd/compile/abi-internal
[go-call-conv-x86]: https://dr-knz.net/go-calling-convention-x86-64.html
[proposal-register-cc]: https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md#background
[how-do-compiler-functions-work]: ../../how_do_compiler_functions_work/

Loading