ava-labs · danlaine · Jul 7, 2023 · Jun 29, 2023 · Jun 29, 2023 · Jun 29, 2023
diff --git a/x/merkledb/README.md b/x/merkledb/README.md
@@ -33,6 +33,181 @@ To reduce the depth of nodes in the trie, a `Merkle Node` utilizes path compress
 +-----------------------------------+
 ```
 
+## Serialization
+
+### Node
+
+Nodes are persisted in an underlying database. In order to persist nodes, we must first serialize them.
+Serialization is done by the `encoder` interface defined in `codec.go`.
+
+The node serialization format is as follows:
+
+```
++----------------------------------------------------+
+| Value existence flag (1 byte)                      |
++----------------------------------------------------+
+| Value length (varint) (optional)                   |
++----------------------------------------------------+
+| Value (variable length bytes) (optional)           |
++----------------------------------------------------+
+| Number of children (varint)                        |
++----------------------------------------------------+
+| Child index (varint)                               |
++----------------------------------------------------+
+| Child compressed path length (varint)              |
++----------------------------------------------------+
+| Child compressed path (variable length bytes)      |
++----------------------------------------------------+
+| Child ID (32 bytes)                                |
++----------------------------------------------------+
+| Child index (varint)                               |
++----------------------------------------------------+
+| Child compressed path length (varint)              |
++----------------------------------------------------+
+| Child compressed path (variable length bytes)      |
++----------------------------------------------------+
+| Child ID (32 bytes)                                |
++----------------------------------------------------+
+|...                                                 |
++----------------------------------------------------+
+```
+
+Where:
+* `Value existence flag` is `1` if this node has a value, otherwise `0`.
+* `Value length` is the length of the value, if it exists (i.e. if `Value existince flag` is `1`.) Otherwise not serialized.
+* `Value` is the value, if it exists (i.e. if `Value existince flag` is `1`.) Otherwise not serialized.
+* `Number of children` is the number of children this node has.
+* `Child index` is the index of a child node within the list of the node's children.
+* `Child compressed path length` is the length of the child node's compressed path.
+* `Child compressed path` is the child node's compressed path.
+* `Child ID` is the child node's ID.
+
+For each child of the node, we have an additional:
+
+```
++----------------------------------------------------+
+| Child index (varint)                               |
++----------------------------------------------------+
+| Child compressed path length (varint)              |
++----------------------------------------------------+
+| Child compressed path (variable length bytes)      |
++----------------------------------------------------+
+| Child ID (32 bytes)                                |
++----------------------------------------------------+
+```
+
+Note that the `Child index` are not necessarily sequential. For example, if a node has 3 children, the `Child index` values could be `0`, `2`, and `15`. 
+However, the `Child index` values must be strictly increasing. For example, the `Child index` values cannot be `0`, `0`, and `1`, or `1`, `0`.
+
+Since a node can have up to 16 children, there can be up to 16 such blocks of children data.
+
+#### Example
+
+Let's take a look at an example node. 
+
+Its byte representation (in hex) is: `0x01020204000210579EB3718A7E437D2DDCE931AC7CC05A0BC695A9C2084F5DF12FB96AD0FA32660E06FFF09845893C4F9D92C4E097FCF2589BC9D6882B1F18D1C2FC91D7DF1D3FCBDB4238`
+
+The node's key is empty (its the root) and has value `0x02`.
+It has two children.
+The first is at child index `0`, has compressed path `0x01` and ID (in hex) `0x579eb3718a7e437d2ddce931ac7cc05a0bc695a9c2084f5df12fb96ad0fa3266`.
+The second is at child index `14`, has compressed path `0x0F0F0F` and ID (in hex) `0x9845893c4f9d92c4e097fcf2589bc9d6882b1f18d1c2fc91d7df1d3fcbdb4238`.
+
+```
++--------------------------------------------------------------------+
+| Value existence flag (1 byte)                                      |
+| 0x01                                                               |
++--------------------------------------------------------------------+
+| Value length (varint) (optional)                                   |
+| 0x02                                                               |
++--------------------------------------------------------------------+
+| Value (variable length bytes) (optional)                           |
+| 0x02                                                               |
++--------------------------------------------------------------------+
+| Number of children (varint)                                        |
+| 0x04                                                               |
++--------------------------------------------------------------------+
+| Child index (varint)                                               |
+| 0x00                                                               |
++--------------------------------------------------------------------+
+| Child compressed path length (varint)                              |
+| 0x02                                                               |
++--------------------------------------------------------------------+
+| Child compressed path (variable length bytes)                      |
+| 0x10                                                               |
++--------------------------------------------------------------------+
+| Child ID (32 bytes)                                                |
+| 0x579EB3718A7E437D2DDCE931AC7CC05A0BC695A9C2084F5DF12FB96AD0FA3266 |
++--------------------------------------------------------------------+
+| Child index (varint)                                               |
+| 0x0E                                                               |
++--------------------------------------------------------------------+
+| Child compressed path length (varint)                              |
+| 0x06                                                               |
++--------------------------------------------------------------------+
+| Child compressed path (variable length bytes)                      |
+| 0xFFF0                                                             |
++--------------------------------------------------------------------+
+| Child ID (32 bytes)                                                |
+| 0x9845893C4F9D92C4E097FCF2589BC9D6882B1F18D1C2FC91D7DF1D3FCBDB4238 |
++--------------------------------------------------------------------+
+```
+
+### Node Hashing
+
+Each node must have a unique ID that identifies it. This ID is calculated by hashing the following values:
+* The node's children
+* The node's value digest
+* The node's key
+
+Specifically, we encode these values in the following way:
+
+```
++----------------------------------------------------+
+| Number of children (varint)                        |
++----------------------------------------------------+
+| Child index (varint)                               |
++----------------------------------------------------+
+| Child ID (32 bytes)                                |
++----------------------------------------------------+
+| Child index (varint)                               |
++----------------------------------------------------+
+| Child ID (32 bytes)                                |
++----------------------------------------------------+
+|...                                                 |
++----------------------------------------------------+
+| Value existence flag (1 byte)                      |
++----------------------------------------------------+
+| Value length (varint) (optional)                   |
++----------------------------------------------------+
+| Value (variable length bytes) (optional)           |
++----------------------------------------------------+
+| Key length (varint)                                |
++----------------------------------------------------+
+| Key (variable length bytes)                        |
++----------------------------------------------------+
+```
+
+Where:
+* `Number of children` is the number of children this node has.
+* `Child index` is the index of a child node within the list of the node's children.
+* `Child ID` is the child node's ID.
+* `Value existence flag` is `1` if this node has a value, otherwise `0`.
+* `Value length` is the length of the value, if it exists (i.e. if `Value existince flag` is `1`.) Otherwise not serialized.
+* `Value` is the value, if it exists (i.e. if `Value existince flag` is `1`.) Otherwise not serialized.
+* `Key length` is the number of nibbles in this node's key.
+* `Key` is the node's key.
+
+Note that, as with the node serialization format, the `Child index` values aren't necessarily sequential, but they are unique and strictly increasing.
+Also like the node serialization format, there can be up to 16 blocks of children data.
+However, note that child compressed paths are not included in the node ID calculation.
+
+Once this is encoded, we `sha256` hash the resulting bytes to get the node's ID.
+
+### Encoding Varints and Bytes
+
+Varints are encoded with `binary.PutVarint` from the standard library's `binary/encoding` package.
+Bytes are encoded by simply copying them onto the buffer.
+
 ## Design choices
 
 ### []byte copying