-
Notifications
You must be signed in to change notification settings - Fork 36
Cannon Overview
This page now lives on the Cannon wiki
Note: this describes Cannon v0.2
. Cannon v0.3
is in active development and introduces breaking changes.
For a shorter & sweeter overview, see Cannon High-Level Overview
Minigeth is stripped down version of geth that removes the database, and adds a preimage oracle instead.
The preimage oracle takes a hash and returns its preimage. This is most notably used to read account values, storage values, and contract code.
Obviously, preimages are not computable, so minigeth either needs to query another node for the preimage, or use pre-supplied images (more on this below).
Running minigeth takes us from an input hash to an output hash.
- The input hash is the hash, for a given block, of the concatenation of:
- the transaction root
- the coinbase
- the uncle root
- the gas limit
- the timestamp
- The output hash is the hash, for the same block, of the concatenation of:
- the state root
- the receipts root
Minigeth can be run in two modes: MIPS or prefetch (aka x64 or PC).
In MIPS mode:
- The input hash is read from the
0x30000000
MIPS memory, where it needs to have been put beforehand (e.g. by theChallenge.sol
contract, or by one of the variousRunXXX
methods inmipsevm
(see later)). - Account values, storage values and contract code are read using preimage oracle.
- The preimage oracle uses pre-supplied preimages. See details in the
mipsevm
section below. - The output hash is written at the
0x30000800
MIPS memory address.
In prefetch mode:
- Minigeth connects to another geth node (URL in env variable
NODE
, defaulting to Infura). - Minigeth will store the "prefetched" data (details below) in a root directory.
- Typically, this is going to be
/tmp/cannon/<chainID>_<blockHash>
- But it is configurable, and doesn't need to be per-chain or per-block.
- Going forward, we'll refer to this directory as
<root>
.
- Typically, this is going to be
- A block number is supplied on the command line.
- The input hash is computed from data fetched from the node, for the supplied
block number, and stored in
<root>/input
. - The preimage oracle queries the node for the preimages, and saves the preimage
in the
<root>/<hash>
file. - The output hash is stored in
<root>/output
.
You can consult this issue to get a better sense of the diff between upstream geth and minigeth.
-
The preimage oracle queries are split into one of the different
PrefetchXXX
call and aPreimage
call.- In MIPS mode, prefetching is a no-op, and
Preimage
reads from a special location in the MIPS memory (see details inmipsevm
section) but caches the value in a Go map. - In Prefetch mode, the
PrefetchXXX
calls query the node at most once and cache in a Go map, andPreimage
just reads from the map. - The reason for the split between
PrefetchXXX
andPreimage
is that it's necessary to know the "type" of preimage (storage slot, account value, ...) in order to query the node.
- In MIPS mode, prefetching is a no-op, and
-
Within
minigeth/oracle
:-
embedded_mips
implements the oracle for MIPS mode. - All other files implement the oracle for prefetch mode.
-
-
Prefetch caching: Besides caching the preimages in a Go map during a single minigeth prefetch run, the result of node queries are also cached on disk in
/tmp/cannon/json_<json_hash>
, wherejson_hash
is the hash of the JSON-RPC query. This speeds up subsequent runs for the same block.
Go supports compilation to MIPS. However, the generated executable is in ELF format.
We'd like to get a pure sequence of MIPS instructions instead. For this purpose,
there is a simple startup routine located in mipigo/startup/startup.s
(the
binary version is mipigo/startup/startup.bin
).
The file mipigo/startup/compile.py
create the new binary by doing the following:
- (??) Generally jiggle things around?
- Add the startup routine at the start of the binary.
- Replace Go garbage collector activation by no-op instructions.
TODO: confirm all of the above with George
Since we don't have a MIPS machine, we can't run minigeth in MIPS mode directly.
Either we can run it using the Unicorn library (which uses the QEMU CPU
emulator), or on the Solidity MIPS interpreter (MIPS.sol
), which itself runs
on the EVM.
The mipsevm
program responsability is precisely to load and run the MIPS
binary, either via Unicorn on the EVM.
Besides this, mipsevm
also needs to intercept the Preimage
calls in order to
return the pre-supplied preimages.
This interception is based on the Preimage
implementation in
minigeth/oracle/embedded_mips.go
, which copies the hash at address
0x30001000
, then calls os.Getpid()
(syscall 4020
), then reads the preimage
from address 0x31000000
(the first 4 bytes are the size, followed by the
preimage proper).
For this to work, we have to intercept the preimage request — either the system
call or the read at 0x31000000
— and write the (previously supplied) preimage
at this address.
The mipsevm
program runs in one of three configuration:
-
unicorn
- This configuration runs the MIPS directly on Unicorn, not on the EVM.
- It hooks Unicorn so that upon a the
os.Getpid()
, we read the preimage from disk (<root>/<hash>
— these are generated when running minigeth in prefetch mode). - This configuration can be run via
mipsevm/main.go
and takes two optional arguments.-
blockNumber
— used to derive the<root>
as/tmp/cannon/0_<blockNumber>
-
target
— a step number in the execution (each step is the execution of a MIPS instruction)
-
- It will write a checkpoint based on the command line parameters.
- A checkpoint is a Merkle trie containing the MIPS memory (as an
address → 32bit value
mapping). - This gets serialized in a JSON file containing the Merkle root, the step number, and a
hash → preimage
mapping (since every node in the trie is a hash of its children). - If the block number isn't provided, it will write the checkpoint corresponding to the initial memory
state in
/tmp/cannon/golden.json
(notice the<root>
isn't used here as the initial state is block-independant). - If the block number and a step number is provided, it will write the checkpoint corresponding to the memory state before the
specified execution step in
<root>/checkpoint_<step>.json
. - If the step number isn't provided, it will write the checkpoint corresponding to the memory state after the last execution step.
- A checkpoint is a Merkle trie containing the MIPS memory (as an
-
chain
- This configuration runs the Solidity MIPS interpreter (
MIPS.sol
) on the EVM. -
MIPS.sol
reads the preimage from theMIPSMemory.sol
contract. - The preimages must have been supplied to the contract beforehand.
- There is an example of this in
mipsevm/compare_evm_to_chain_test.go
.
- There is an example of this in
- This configuration runs the Solidity MIPS interpreter (
-
evm
- This configuration also runs the Solidity MIPS interpreter on the EVM.
- The memory address in
MIPS.sol
is set to 0 (instead of a deployment ofMIPSMemory.sol
), and the (MIPS) memory is read viaSLOAD
. - The state DB implementation is modified (in
mipsevm/minievm.go
) so that reading the contact state is able to detect theos.Getpid()
syscall and read the preimage from disk (/tmp/cannon/<hash>
— generated by minigeth in prefetch mode). - TODO: "doesn't compute the merkle tree"
The chain configuration is the one we care about (it's how the single-step execution will be run). However, it's also pretty slow, it's useful to have the two other configurations as alternative trade-offs between speed and closeness to production environment.
The last step of the challenge game is a single-step (i.e. single MIPS instruction) execution, performed in the style of the chain mipsevm configuration.
To run, this single step needs to access the MIPS memory: it will need to access the MIPS code to read the instruction, then some registers and/or memory locations to execute it. If reading from the 0x31000000-0x32000000
memory, we'll also need to access chain preimages.
In order to enter the single-step execution phase, both players need to have reached agreement on the root of a Merkle trie mapping the whole MIPs memory. The MIPS memory is read out from this trie using the MIPSMemory.sol
contract (which has a single on-chain deployment).
This Merkle trie read works in the regular way, assuming a backing hash -> preimage
database: the trie is traversed node by node along a path determined by the bytes in the key (in this case, the MIPS memory address). Each node in the trie is a hash whose childen can be obtained by retrieving the preimage of the node hash.
For this to work during the single step execution, the hash -> preimage
mappings necessary for the trie traversal need to have been supplied in advance. This is done via the MIPSMemory.AddTrieNode(data)
function, where the data supplied is the preimage (from which the hash is recomputed). These preimages can be acquired from a checkpoint generated by running mipsevm
in unicorn mode.
Similarly, chain preimages must have been supplied if required, via the MIPSMemory.AddPreimage(data, offset)
function. (Remember these preimages are acquired by running minigeth
in prefetch mode.) The reason why these are handled by different function (despite both being simple hash -> preimage
mappings) is that for the chain preimages, it's only necessary to store the 4 bytes starting at the specified offset
(as a MIPS instruction can only ever ever touch this much data at a time). On the other hand, the MIPS trie node preimages are needed in full to perform the trie traversal. Separating both cases enables storage savings.
NOTE: This could be changed if we replace the Merkle trie with a simpler tree-hash structure. We strongly feel we should do this, as the Merkle trie logic is complex and brittle. It is easily feasible because every leaf in the MIPS memory is a fixed size (4 bytes) memory slot.