Skip to content

Commit

Permalink
wazevo(docs): optimizing compiler (#2065)
Browse files Browse the repository at this point in the history
Signed-off-by: Edoardo Vacchi <evacchi@users.noreply.github.com>
  • Loading branch information
evacchi authored Mar 9, 2024
1 parent 15cc0c5 commit b7b54d5
Show file tree
Hide file tree
Showing 5 changed files with 1,196 additions and 1 deletion.
3 changes: 2 additions & 1 deletion site/content/docs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,8 @@ Notably, the interpreter and compiler in wazero's [Runtime configuration][Runtim
In wazero, a compiler is a runtime configured to compile modules to platform-specific machine code ahead of time (AOT)
during the creation of [CompiledModule][CompiledModule]. This means your WebAssembly functions execute
natively at runtime of the embedding Go program. Compiler is faster than Interpreter, often by order of
magnitude (10x) or more, and therefore enabled by default whenever available.
magnitude (10x) or more, and therefore enabled by default whenever available. You can read more about wazero's
[optimizing compiler in the detailed documentation]({{< relref "/how_the_optimizing_compiler_works" >}}).

#### Interpreter

Expand Down
131 changes: 131 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
+++
title = "How the Optimizing Compiler Works"
layout = "single"
+++

wazero supports two modes of execution: interpreter mode and compilation mode.
The interpreter mode is a fallback mode for platforms where compilation is not
supported. Compilation mode is otherwise the default mode of execution: it
translates Wasm modules to native code to get the best run-time performance.

Translating Wasm bytecode into machine code can take multiple forms. wazero
1.0 performs a straightforward translation from a given instruction to a native
instruction. wazero 2.0 introduces an optimizing compiler that is able to
perform nontrivial optimizing transformations, such as constant folding or
dead-code elimination, and it makes better use of the underlying hardware, such
as CPU registers. This document digs deeper into what we mean when we say
"optimizing compiler", and explains how it is implemented in wazero.

This document is intended for maintainers, researchers, developers and in
general anyone interested in understanding the internals of wazero.

What is an Optimizing Compiler?
-------------------------------

Wazero supports an _optimizing_ compiler in the style of other optimizing
compilers such as LLVM's or V8's. Traditionally an optimizing
compiler performs compilation in a number of steps.

Compare this to the **old compiler**, where compilation happens in one step or
two, depending on how you count:


```goat
Input +---------------+ +---------------+
Wasm Binary ---->| DecodeModule |---->| CompileModule |----> wazero IR
+---------------+ +---------------+
```

That is, the module is (1) validated then (2) translated to an Intermediate
Representation (IR). The wazero IR can then be executed directly (in the case
of the interpreter) or it can be further processed and translated into native
code by the compiler. This compiler performs a straightforward translation from
the IR to native code, without any further passes. The wazero IR is not intended
for further processing beyond immediate execution or straightforward
translation.

```goat
+---- wazero IR ----+
| |
v v
+--------------+ +--------------+
| Compiler | | Interpreter |- - - executable
+--------------+ +--------------+
|
+----------+---------+
| |
v v
+---------+ +---------+
| ARM64 | | AMD64 |
| Backend | | Backend | - - - - - - - - - executable
+---------+ +---------+
```


Validation and translation to an IR in a compiler are usually called the
**front-end** part of a compiler, while code-generation occurs in what we call
the **back-end** of a compiler. The front-end is the part of a compiler that is
closer to the input, and it generally indicates machine-independent processing,
such as parsing and static validation. The back-end is the part of a compiler
that is closer to the output, and it generally includes machine-specific
procedures, such as code-generation.

In the **optimizing** compiler, we still decode and translate Wasm binaries to
an intermediate representation in the front-end, but we use a textbook
representation called an **SSA** or "Static Single-Assignment Form", that is
intended for further transformation.

The benefit of choosing an IR that is meant for transformation is that a lot of
optimization passes can apply directly to the IR, and thus be
machine-independent. Then the back-end can be relatively simpler, in that it
will only have to deal with machine-specific concerns.

The wazero optimizing compiler implements the following compilation passes:

* Front-End:
- Translation to SSA
- Optimization
- Block Layout
- Control Flow Analysis

* Back-End:
- Instruction Selection
- Registry Allocation
- Finalization and Encoding

```goat
Input +-------------------+ +-------------------+
Wasm Binary --->| DecodeModule |----->| CompileModule |--+
+-------------------+ +-------------------+ |
+----------------------------------------------------------+
|
| +---------------+ +---------------+
+->| Front-End |----------->| Back-End |
+---------------+ +---------------+
| |
v v
SSA Instruction Selection
| |
v v
Optimization Registry Allocation
| |
v v
Block Layout Finalization/Encoding
```

Like the other engines, the implementation can be found under `engine`, specifically
in the `wazevo` sub-package. The entry-point is found under `internal/engine/wazevo/engine.go`,
where the implementation of the interface `wasm.Engine` is found.

All the passes can be dumped to the console for debugging, by enabling, the build-time
flags under `internal/engine/wazevo/wazevoapi/debug_options.go`. The flags are disabled
by default and should only be enabled during debugging. These may also change in the future.

In the following we will assume all paths to be relative to the `internal/engine/wazevo`,
so we will omit the prefix.

## Index

- [Front-End](frontend/)
- [Back-End](backend/)
- [Appendix](appendix/)
185 changes: 185 additions & 0 deletions site/content/docs/how_the_optimizing_compiler_works/appendix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
+++
title = "Appendix: Trampolines"
layout = "single"
+++

Trampolines are used to interface between the Go runtime and the generated
code, in two cases:

- when we need to **enter the generated code** from the Go runtime.
- when we need to **leave the generated code** to invoke a host function
(written in Go).

In this section we want to complete the picture of how a Wasm function gets
translated from Wasm to executable code in the optimizing compiler, by
describing how to jump into the execution of the generated code at run-time.

## Entering the Generated Code

At run-time, user space invokes a Wasm function through the public
`api.Function` interface, using methods `Call()` or `CallWithStack()`. The
implementation of this method, in turn, eventually invokes an ASM
**trampoline**. The signature of this trampoline in Go code is:

```go
func entrypoint(
preambleExecutable, functionExecutable *byte,
executionContextPtr uintptr, moduleContextPtr *byte,
paramResultStackPtr *uint64,
goAllocatedStackSlicePtr uintptr)
```

- `preambleExecutable` is a pointer to the generated code for the preamble (see
below)
- `functionExecutable` is a pointer to the generated code for the function (as
described in the previous sections).
- `executionContextPtr` is a raw pointer to the `wazevo.executionContext`
struct. This struct is used to save the state of the Go runtime before
entering or leaving the generated code. It also holds shared state between the
Go runtime and the generated code, such as the exit code that is used to
terminate execution on failure, or suspend it to invoke host functions.
- `moduleContextPtr` is a pointer to the `wazevo.moduleContextOpaque` struct.
This struct Its contents are basically the pointers to the module instance,
specific objects as well as functions. This is sometimes called "VMContext" in
other Wasm runtimes.
- `paramResultStackPtr` is a pointer to the slice where the arguments and
results of the function are passed.
- `goAllocatedStackSlicePtr` is an aligned pointer to the Go-allocated stack
for holding values and call frames. For further details refer to
[Backend § Prologue and Epilogue](../backend/#prologue-and-epilogue)

The trampoline can be found in`backend/isa/<arch>/abi_entry_<arch>.s`.

For each given architecture, the trampoline:
- moves the arguments to specific registers to match the behavior of the entry preamble or trampoline function, and
- finally, it jumps into the execution of the generated code for the preamble

The **preamble** that will be jumped from `entrypoint` function is generated per function signature.

This is implemented in `machine.CompileEntryPreamble(*ssa.Signature)`.

The preamble sets the fields in the `wazevo.executionContext`.

At the beginning of the preamble:

- Set a register to point to the `*wazevo.executionContext` struct.
- Save the stack pointers, frame pointers, return addresses, etc. to that
struct.
- Update the stack pointer to point to `paramResultStackPtr`.

The generated code works in concert with the assumption that the preamble has
been entered through the aforementioned trampoline. Thus, it assumes that the
arguments can be found in some specific registers.

The preamble then assigns the arguments pointed at by `paramResultStackPtr` to
the registers and stack location that the generated code expects.

Finally, it invokes the generated code for the function.

The epilogue reverses part of the process, finally returning control to the
caller of the `entrypoint()` function, and the Go runtime. The caller of
`entrypoint()` is also responsible for completing the cleaning up procedure by
invoking `afterGoFunctionCallEntrypoint()` (again, implemented in
backend-specific ASM). which will restore the stack pointers and return
control to the caller of the function.

The arch-specific code can be found in
`backend/isa/<arch>/abi_entry_preamble.go`.

[wazero-engine-stack]: https://github.com/tetratelabs/wazero/blob/095b49f74a5e36ce401b899a0c16de4eeb46c054/internal/engine/compiler/engine.go#L77-L132
[abi-arm64]: https://tip.golang.org/src/cmd/compile/abi-internal#arm64-architecture
[abi-amd64]: https://tip.golang.org/src/cmd/compile/abi-internal#amd64-architecture
[abi-cc]: https://tip.golang.org/src/cmd/compile/abi-internal#function-call-argument-and-result-passing


## Leaving the Generated Code

In "[How do compiler functions work?][how-do-compiler-functions-work]", we
already outlined how _leaving_ the generated code works with the help of a
function. We will complete here the picture by briefly describing the code that
is generated.

When the generated code needs to return control to the Go runtime, it inserts a
meta-instruction that is called `exitSequence` in both `amd64` and `arm64`
backends. This meta-instruction sets the `exitCode` in the
`wazevo.executionContext` struct, restore the stack pointers and then returns
control to the caller of the `entrypoint()` function described above.

As described in "[How do compiler functions
work?][how-do-compiler-functions-work]", the mechanism is essentially the same
when invoking a host function or raising an error. However, when a function is
invoked the `exitCode` also indicates the identifier of the host function to be
invoked.

The magic really happens in the `backend.Machine.CompileGoFunctionTrampoline()`
method. This method is actually invoked when host modules are being
instantiated. It generates a trampoline that is used to invoke such functions
from the generated code.

This trampoline implements essentially the same prologue as the `entrypoint()`,
but it also reserves space for the arguments and results of the function to be
invoked.

A host function has the signature:

```
func(ctx context.Context, stack []uint64)
```

the function arguments in the `stack` parameter are copied over to the reserved
slots of the real stack. For instance, on `arm64` the stack layout would look
as follows (on `amd64` it would be similar):

```goat
(high address)
SP ------> +-----------------+ <----+
| ....... | |
| ret Y | |
| ....... | |
| ret 0 | |
| arg X | | size_of_arg_ret
| ....... | |
| arg 1 | |
| arg 0 | <----+ <-------- originalArg0Reg
| size_of_arg_ret |
| ReturnAddress |
+-----------------+ <----+
| xxxx | | ;; might be padded to make it 16-byte aligned.
+--->| arg[N]/ret[M] | |
sliceSize| | ............ | | goCallStackSize
| | arg[1]/ret[1] | |
+--->| arg[0]/ret[0] | <----+ <-------- arg0ret0AddrReg
| sliceSize |
| frame_size |
+-----------------+
(low address)
```

Finally, the trampoline jumps into the execution of the host function using the
`exitSequence` meta-instruction.

Upon return, the process is reversed.

## Code

- The trampoline to enter the generated function is implemented by the
`backend.Machine.CompileEntryPreamble()` method.
- The trampoline to return traps and invoke host functions is generated by
`backend.Machine.CompileGoFunctionTrampoline()` method.

You can find arch-specific implementations in
`backend/isa/<arch>/abi_go_call.go`,
`backend/isa/<arch>/abi_entry_preamble.go`, etc. The trampolines are found
under `backend/isa/<arch>/abi_entry_<arch>.s`.

## Further References

- Go's [internal ABI documentation][abi-internal] details the calling convention similar to the one we use in both arm64 and amd64 backend.
- Raphael Poss's [The Go low-level calling convention on
x86-64][go-call-conv-x86] is also an excellent reference for `amd64`.

[abi-internal]: https://tip.golang.org/src/cmd/compile/abi-internal
[go-call-conv-x86]: https://dr-knz.net/go-calling-convention-x86-64.html
[proposal-register-cc]: https://go.googlesource.com/proposal/+/master/design/40724-register-calling.md#background
[how-do-compiler-functions-work]: ../../how_do_compiler_functions_work/

Loading

0 comments on commit b7b54d5

Please sign in to comment.