-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: how it works, networking with state images and profiling
- Loading branch information
Showing
5 changed files
with
135 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
Here's an overview of v86's workings. For details, check the | ||
[source](https://github.com/copy/v86/tree/master/src). | ||
|
||
The major limitations of WebAssembly are (for the purpose of making emulators with jit): | ||
|
||
- structured control flow (no arbitrary jumps) | ||
- no control over registers (you can't keep hardware registers in wasm locals across functions) | ||
- no mmap (paging needs to be fully emulated) | ||
- no patching | ||
- module generation is fairly slow, but at least it's asynchronous, so other things can keep running | ||
- there is some memory overhead per module, so you can't generate more than a few thousand | ||
|
||
v86 has an interpreted mode, which collects entry points (targets of function | ||
calls and indirect jumps). It also measures the hotness per page, so that | ||
compilation is focused on code that is often executed. Once a page is | ||
considered hot, code is generated for the entire page and up to `MAX_PAGES` | ||
that are directly reachable from it. | ||
|
||
v86 generates a single function with a big switch statement (brtable), to | ||
ensure that all functions and targets of indirect jumps are reachable from | ||
other modules. The remaining control flow is handled using the "stackifier" | ||
algorithm (well-explained in | ||
[this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)). | ||
At the moment, there is no linking of wasm modules. The current module is | ||
exited, and the main loop detects if a new module can be entered. | ||
|
||
In practice, I found that browsers don't handle this structure (deep brtables, | ||
with locals being used across the entire function) very well, and `MAX_PAGES` | ||
has to be set to fairly low, otherwise memory usage blows up. It's likely that | ||
improvements are possible (generating fewer entry points, splitting code across | ||
multiple functions). | ||
|
||
Code-generation happens in two passes. The first pass finds all basic block | ||
boundaries, the second generates code for each basic block. Instruction | ||
decoding is generated by a [set of | ||
scripts](https://github.com/copy/v86/tree/master/gen) from a [table of | ||
instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's | ||
also used to [generate | ||
tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js). | ||
|
||
To handle paging, v86 generates code similar to this (see `gen_safe_read`): | ||
|
||
``` | ||
entry <- tlb[addr >> 12 << 2] | ||
if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast | ||
entry <- safe_read_jit_slow(addr, instruction_pointer) | ||
if page_fault: goto exit-with-pagefault | ||
fast: mem[(entry & ~0xFFF) ^ addr] | ||
``` | ||
|
||
There is a 4 MB cache that acts like a tlb. It contains the physical address, | ||
read-only bit, whether the page contains code (in order to invalidate it on | ||
write) and whether the page points to mmio. Any of those cases are handled in | ||
the slow path (`safe_read_jit_slow`), as well as walking the page tables and | ||
triggering page faults. The fast path is taken in the vast majority of times. | ||
|
||
The remaining code generation is mostly a straight-forward, 1-to-1 translation | ||
of x86 to wasm. The only analysis done is to optimise generation of condional | ||
jumps immediately after arithmetic instructions, e.g.: | ||
|
||
``` | ||
cmp eax, 52 | ||
setb eax | ||
``` | ||
|
||
becomes: | ||
|
||
``` | ||
... // code for cmp | ||
eax <- eax < 52 | ||
``` | ||
|
||
A lazy flag mechanism is used to speed arithmetic (applies to both jit and | ||
interpreted mode, see | ||
[`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and | ||
[`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)). | ||
There's a wip that tries to elide most lazy-flags updates: | ||
https://github.com/copy/v86/pull/466 | ||
|
||
FPU instructions are emulated using softfloat (very slow, but unfortunately | ||
some code relies on 80 bit floats). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
v86 has a built-in profiler, which instruments generated code to count certain | ||
events and types of instructions. It can be used by building with `make | ||
debug-with-profiler` and opening debug.html. | ||
|
||
For debugging networking, packet logging is available in the UI in both debug | ||
and release builds. The resulting `traffic.hex` file can be loaded in Wireshark | ||
using file -> import from hex -> tick direction indication, timestamp %s.%f. |