-
Notifications
You must be signed in to change notification settings - Fork 0
ExtraData
The "extra data" pattern is used in many stages of the compiler to add additional data to an object (like a node or instruction) without increasing the size of the base object, especially when using large arrays of union types. This was borrowed/inspired by the Zig self hosting compiler.
Note: when I say "array" in this article, I actually mean an ArrayList (dynamically resizing contiguous array) at generation time which is then converted to an immutable static array once the compiler stage is complete.
For example, let us look at the AST. The tree is stored as a flat array of Node structures, where each node looks like the following:
const Node = struct {
main_token: u32,
data: Data,
const Data = union(enum) {
int_lit: void,
bin_expr: struct {
l: u32,
r: u32,
},
};
};In this setup, each Node contains a tagged union of its type (an enum) and data (up to two u32 values). This keeps the structure compact, and doesn't waste much space - we expect each node to have, on average, two children. Some nodes have none (void) if they are leaf nodes and completely described by their main_token. However, some nodes need to store information; for example, function declarations have a lot of signature metadata and any number of parameters, and calls store any number of arguments.
This is where the extra data array comes in. Imagine that we need to store a ranged for loop. This has four elements: 1) the binding (variable declared or initialized), 2) the condition (comparison on each iteration), 3) the afterthought (usually a post increment/decrement on each iteration), and 4) the loop body. While we want to store four child nodes, we only have space for two. So, we split this into two parts: data.signature, and data.body. Body points to the body block, as normal. However, the signature u32 is not a node index.
Instead, we construct the following structure:
const RangeSignature = struct {
binding: u32,
condition: u32,
afterthought: u32,
};Note that each of these three values are u32 values that themselves point to child node indices. Using zig's comptime type mechanics, we are able to pack this struct into an array of three u32 values. We then append this entire array to the running extra array, and store the index of the start of this slice in data.signature. So extra[data.signature] is the binding, extra[data.signature + 1] is the condition, etc. This packing is performed by HirGen.addExtra().
Similarly, we can comptime-generate code to unpack a slice of the extra array into a struct, specifying the struct type and start index. This is performed by Hir.extraData().
This works great if we want to store a fixed number of metadata elements (greater than two). Now, what happens if we want to store a dynamic list of elements, like a parameter or member list? Here comes the second (related) use for extra. We first construct an array where each element is the index of the node element of the parameter/member/etc. We then append this array to extra, keeping note of the start and end index that this subarray takes up in extra. We then append a struct of the form, filling in the indices:
const Params = struct {
params_start: u32,
params_end: u32,
};This can be a bit confusing, so let's go over what we are going to do to unpack the data:
- Read the
data.paramsmember of the node - Unpack the range struct using
const range = tree.extraData(data.params, Params). Remember thatextra[data.params]isparams_startandextra[data.params + 1]isparams_end. - Unpack the param indices array:
const params = extra[range.params_start..range.params_end]. Each of the values in this array correspond to a node. - Iterate through this array, and fetch
tree.nodes[param](for param in params).