11# Codelab
22
3- This codelab will walk you trough all the steps required to build a Tiled tree.
3+ Throughout this codelab, you'll create a [ Tiled tree] ( https://research.swtch.com/tlog#tiling_a_log ) .
44
55The Tiled tree will be stored on disk using the layout described in the [ layout
66directory] ( api/layout/README.md ) . Its checkpoint uses the [ checkpoint format] ( https://github.com/transparency-dev/formats/blob/main/log/README.md#checkpoint-format ) .
77
88## Prelimiary setup
99
10- The command-line tools in thi repository can generate tile based logs from leaf
10+ The command-line tools we'll use from this repository can generate tile based logs from leaf
1111data stored on your file system. Each file will correspond to a single leaf in
1212the tree.
1313
@@ -19,7 +19,7 @@ export LOG_DIR="/tmp/mylog" # where the tree will be stored
1919export LOG_ORIGIN=" My Log" # the origin of the log used by the Checkpoint format
2020```
2121
22- Checkpoints are signed, and we need a public/private key pair for this.
22+ Checkpoints of the log will be signed, and we need a public/private key pair for this.
2323
2424Use the ` generate_keys ` command with ` --key_name ` , a name
2525for the signing entity. You can output the public and private keys to files using
@@ -37,7 +37,7 @@ To create a new log state directory, use the `integrate` command with the `--ini
3737flag, and either passing key files or with environment variables set:
3838
3939``` bash
40- go run ./cmd/integrate --initialise --storage_dir=" ${LOG_DIR} " --logtostderr -- public_key=key.pub --private_key=key --origin=" ${LOG_ORIGIN} "
40+ go run ./cmd/integrate --initialise --storage_dir=" ${LOG_DIR} " --public_key=key.pub --private_key=key --origin=" ${LOG_ORIGIN} "
4141```
4242
4343After running this command, the log state directory looks like this:
@@ -53,12 +53,16 @@ $ tree /tmp/mylog/
5353
54545 directories, 1 file
5555```
56+ - ` checkpoint ` contains the latest log checkpoint in the format described [ here] ( https://github.com/transparency-dev/formats/tree/main/log ) .
57+ - ` seq/ ` contains a directory hierarchy containing leaf data for each sequenced entry in the log.
58+ - ` leaves/ ` contains files which map all known leaf hashes to their position in the log.
59+ - ` tile/ ` contains the internal nodes of the log tree.
5660
57- See the [ layout] ( api/layout/README.md ) documentation for an explanation of what each directory is for .
61+ See the [ layout] ( api/layout/README.md ) documentation for more details about each directory.
5862
5963Let's look at the checkpoint content:
6064
61- ```
65+ ``` bash
6266$ cat /tmp/mylog/checkpoint
6367My Log
64680
6771— astra PlUh/n54e2dSIKi6kHjea5emrGnmC7lJVDgnIfWGIJmgFqp22k0UlnUk97L2ViqrFm986NwV+wJYGnrtRPJTBV0GrA0=
6872```
6973
70- - ` My Log ` is the origin from above.
74+ - ` My Log ` is the origin that we defined above
7175- ` 0 ` is the number of leaves in the tree, which currently is 0
7276- ` 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= ` is the [ hash of an empty slice of bytes] ( https://go.dev/play/p/imi_2TM6DyI ) , since the log is empty.
7377- The last line is a signature over this data, using the astra private key we've generated above
@@ -76,33 +80,29 @@ My Log
7680### Creating log content
7781Now let's add some leaves to the log.
7882
79- Firt , we generate the input data with:
83+ First , we generate the input data with:
8084``` bash
8185$ mkdir $DATA_DIR
82- $ for i in $( seq 0 3) ; do x=$( printf " %03d" $i ) ; echo " leaf_data_$x " > /tmp/files /leaf_$x ; done ;
86+ $ for i in $( seq 0 3) ; do x=$( printf " %03d" $i ) ; echo " leaf_data_$x " > $DATA_DIR /leaf_$x ; done ;
8387```
8488
85- To add the contents of some files to a log, use the ` sequence ` command with the
86- ` --entries ` flag set to a filename glob of files to add and either passing the public key
87- file or with the environment variable set:
89+ To add the contents of these files to the log, use the ` sequence ` command with the
90+ ` --entries ` flag set to a filename glob of files to add:
8891
8992``` bash
90- $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries ' /tmp/files/* ' --public_key=key.pub --origin=" ${LOG_ORIGIN} "
93+ $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries " ${DATA_DIR} /* " --public_key=key.pub --origin=" ${LOG_ORIGIN} "
9194I1221 13:16:23.940255 923589 main.go:131] 0: /tmp/files/leaf_000
9295I1221 13:16:23.940806 923589 main.go:131] 1: /tmp/files/leaf_001
9396I1221 13:16:23.941218 923589 main.go:131] 2: /tmp/files/leaf_002
9497I1221 13:16:23.941673 923589 main.go:131] 3: /tmp/files/leaf_003
9598```
9699
97- The ` sequence ` commands stores data in the log directory using convenient
98- formats. The ` leaves ` directory contains the leaf index of each leaf hash.
99- Let's take the leaf at index ` 0 ` , which happens to contain ` leaf_data_0 ` .
100- This tree uses RFC6962's default hasher, where ` leaf_hash = sha256(0x + leaf_data) ` .
101- ` 8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d ` , the path for
102- the leaf at index 0 with forward slashes removed is the [ hexadecimal representation
103- of this hash] ( https://go.dev/play/p/POnCQ7IXayk ) .
100+ The ` sequence ` commands assigns an index to each leaf, and stores data in the log directory using convenient
101+ formats.
104102
105- ```
103+ Here is what the directory looks like:
104+
105+ ``` bash
106106$ grep -RH ' ^' /tmp/mylog/
107107/tmp/mylog/checkpoint:My Log
108108/tmp/mylog/checkpoint:0
@@ -119,6 +119,16 @@ $ grep -RH '^' /tmp/mylog/
119119/tmp/mylog/seq/00/00/00/00/03:leaf_data_003
120120```
121121
122+ The ` seq ` directory contains the leaves data, in files named after each leaf's index.
123+
124+ The ` leaves ` stores the leaf index of each leaf, in a file named after the leaf hash.
125+ Let's take the leaf at index ` 0 ` , which conveniently happens to contain ` leaf_data_000 ` .
126+ This tree uses [ RFC6962's hashing function] ( https://www.rfc-editor.org/rfc/rfc6962#page-4 ) , where ` leaf_hash = sha256(0x + leaf_data) ` .
127+
128+ ` 8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d ` , the path for
129+ the leaf at index 0 with forward slashes removed, is the [ hexadecimal representation
130+ of this hash] ( https://go.dev/play/p/POnCQ7IXayk ) .
131+
122132Note that at this point, no internal node of the tree has been computed, and neither
123133has the checkpoint been updated. Leaves have only been assigned with a position
124134in the log.
@@ -128,7 +138,7 @@ tool telling you that you're trying to add duplicate entries, along with their
128138originally assigned sequence numbers:
129139
130140``` bash
131- $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries ' /tmp/files/* ' --public_key=key.pub --origin=" ${LOG_ORIGIN} "
141+ $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries " ${DATA_DIR} /* " --public_key=key.pub --origin=" ${LOG_ORIGIN} "
132142I1221 13:18:59.735244 924268 main.go:131] 0: /tmp/files/leaf_000 (dupe)
133143I1221 13:18:59.735362 924268 main.go:131] 1: /tmp/files/leaf_001 (dupe)
134144I1221 13:18:59.735406 924268 main.go:131] 2: /tmp/files/leaf_002 (dupe)
@@ -137,9 +147,8 @@ I1221 13:18:59.735447 924268 main.go:131] 3: /tmp/files/leaf_003 (dupe)
137147
138148### Integrating sequenced entries
139149
140- We still need to update the rest of the tree structure to integrate these new entries.
141- We use the ` integrate ` tool for that, again either passing key files or with the
142- environment variables set:
150+ We still need to update the rest of the tree structure to integrate these new entries, generate the other nodes of the tree, and compute its new checkpoint.
151+ We use the ` integrate ` tool for that:
143152
144153``` bash
145154$ go run ./cmd/integrate --storage_dir=" ${LOG_DIR} " --public_key=key.pub --private_key=key --origin=" ${LOG_ORIGIN} "
@@ -148,9 +157,9 @@ I1221 13:19:20.190432 924589 integrate.go:132] New log state: size 0x4 hash: 0c
148157```
149158
150159This output says that the integration was successful, and we now have a new log
151- tree state which contains ` 0x08 ` entries, and has the printed log root hash.
160+ tree state which contains 4 entries, and has the printed log root hash.
152161
153- Let's look at the contents of the tree directory:
162+ Let's look at the contents of the tree directory again :
154163
155164``` bash
156165$ grep -RH ' ^' /tmp/mylog/
@@ -181,18 +190,18 @@ $ grep -RH '^' /tmp/mylog/
181190The tile directory has been populated with a file, and the checkpoint has been updated.
182191The ` leaves/ ` and ` seq/ ` directories have not changed.
183192
184- Each tile can store a maximum of 256 leaf hashes. Since we only have 4 for now, they
185- fit in a single file. Since it's the first tile of the tree, [ its path is 00/0000/00/00/00] ( api/layout#tile )
193+ Each tile can store a maximum of 256 leaf hashes. Since we only have 4 leaves for now, hashes
194+ fit in a single file. Given it is the first tile of the tree, [ its path is 00/0000/00/00/00] ( api/layout#tile )
186195Until the tile is filed with 256 leaves, the tile is "partial",
187196that's what the ` 00.04 ` notation means: tile ` 00/0000/00/00/00.04 ` is the partial
188197` 00/0000/00/00/00 ` tile with 4 leaf hashes.
189198
190- Let's look at each line in the files :
199+ Let's look at each line of this tile file :
191200 - ` 32 ` that's the number of bytes used for hashes
192201 - ` 4 ` the number of leaf hashes in this tile
193- - series of hashes representing the leaf hashes of the tile, and the compact range they
194- cover
202+ - the remaining lines are a series of hashes representing the node hashes of the tile: both the leaf hashes, and internal node hashes
195203
204+ Here is what a merkle tree with 4 leaves looks like:
196205```
197206 b
198207 / \
@@ -203,37 +212,38 @@ Let's look at each line in the files:
203212 h0 h1 h2 h3
204213 | | | |
205214 0 1 2 3
206-
207215```
208216
209- We can spot the [ leaves and internal node hashes] ( https://go.dev/play/p/6guNHqpr388 ) in the infix tree-traversal order .
217+ In the tile file, leaves and internal node hashes are stored in the [ infix tree-traversal order ] ( https://go.dev/play/p/eZErmZdTwdB ) .
210218
211219``` bash
212220$ cat /tmp/mylog/tile/00/0000/00/00/00.04
21322132
2142224
215- hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= < -- h0 = sha256(0x0 + leaf_data_0 )
223+ hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= < -- h0 = sha256(0x0 + leaf_data_000 )
216224McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= < -- a = sha256(0x1 + h0 + h1)
217- uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= < -- h1 = sha256(0x0 + leaf_data_1 )
225+ uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= < -- h1 = sha256(0x0 + leaf_data_001 )
218226DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= < -- b = sha256(0x1 + a + c)
219- bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= < -- h2 = sha256(0x0 + leaf_data_2 )
227+ bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= < -- h2 = sha256(0x0 + leaf_data_002 )
220228jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= < -- c = sha(0x1 + h2 + h3)
221- 4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= < -- h3 = sha256(0x0 + leaf_data_3 )
229+ 4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= < -- h3 = sha256(0x0 + leaf_data_003 )
222230```
223231
224232### Adding one more leaf
233+ Let's add one more leaf to our tree.
234+
225235``` bash
226- $ echo " leaf_data_004" > /tmp/files /leaf_004
236+ $ echo " leaf_data_004" > $DATA_DIR /leaf_004
227237
228- $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries ' /tmp/files/ leaf_004' --public_key=key.pub --origin=" ${LOG_ORIGIN} "
238+ $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries " ${DATA_DIR} / leaf_004" --public_key=key.pub --origin=" ${LOG_ORIGIN} "
229239I1221 13:23:43.956356 926120 main.go:131] 4: /tmp/files/leaf_004
230240
231241$ go run ./cmd/integrate --storage_dir=" ${LOG_DIR} " --public_key=key.pub --private_key=key --origin=" ${LOG_ORIGIN} "
232242I1221 13:24:11.168864 926446 integrate.go:94] Loaded state with roothash 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f
233243I1221 13:24:11.169036 926446 integrate.go:132] New log state: size 0x5 hash: 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496
234244```
235245
236- This adds matchin files in ` seq ` , ` leaves ` , and updates the checkcpoint , as expected.
246+ This adds matching files in ` seq ` , ` leaves ` , and updates the checkpoint , as expected.
237247A new tile is availble under ` 00/0000/00/00/00/00.05 ` :
238248
239249``` bash
@@ -248,9 +258,9 @@ $ tree /tmp/mylog/tile
2482585 directories, 2 files
249259```
250260
251- Notice that the old tile, ` 00.04 ` has not been deleted.
261+ Notice that the old tile file , ` 00.04 ` has not been deleted.
252262
253- Here's the diff between the two leaves :
263+ Here's the diff between the two tiles :
254264
255265``` bash
256266$ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
@@ -263,19 +273,40 @@ $ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
263273> 6KUzDe4gX/0rZTZCgfgBtaIGOBkOQz4duxjTT+NeM5w=
264274```
265275
266- The number of leaves ` 4 ` has been updated to ` 5 ` , and a new leaf node hash has appeared.
267- Note that even though the tree has changed shape to include this new leaf, no internal
268- node was added to the tile. That's because tiles only store non-emphemeral node, and in this
269- case, all the new interanl nodes are ephemeral: they will change when new leaves are added to
270- the tree.
276+ The number of leaves ` 4 ` has been updated to ` 5 ` , and a new leaf node hash has
277+ appeared. Note that even though the tree has changed shape to include this new
278+ leaf, no internal node was added to the tile. That's because tiles only store
279+ non-emphemeral node, and in this case, all the new internal nodes are ephemeral
280+ (marked with a prime symbol): they will change when new leaves are added to the
281+ tree.
282+
283+ ```
284+ f'
285+ / \
286+ / \
287+ / \
288+ / \
289+ / \
290+ / \
291+ / \
292+ b e'
293+ / \ / \
294+ / \ / \
295+ / \ / \
296+ a c d' X
297+ / \ / \ / \
298+ h0 h1 h2 h3 h4 X
299+ | | | | |
300+ 0 1 2 3 4
301+ ```
271302
272303### Filling up the tile
273- Let 's fill up the tile, with 256 entries:
304+ Now, let 's fill up the tile, with the maximum number of leaves it can hold: 256.
274305
275306``` bash
276- $ for i in $( seq 5 255) ; do x=$( printf " %03d" $i ) ; echo " leaf_data_$x " > /tmp/files /leaf_$x ; done ;
307+ $ for i in $( seq 5 255) ; do x=$( printf " %03d" $i ) ; echo " leaf_data_$x " > $DATA_DIR /leaf_$x ; done ;
277308
278- $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries ' /tmp/files/* ' --public_key=key.pub --origin=" ${LOG_ORIGIN} "
309+ $ go run ./cmd/sequence --storage_dir=" ${LOG_DIR} " --entries " ${DATA_DIR} /* " --public_key=key.pub --origin=" ${LOG_ORIGIN} "
279310I1221 13:26:19.752225 927458 main.go:131] 0: /tmp/files/leaf_000 (dupe)
280311I1221 13:26:19.752350 927458 main.go:131] 1: /tmp/files/leaf_001 (dupe)
281312I1221 13:26:19.752398 927458 main.go:131] 2: /tmp/files/leaf_002 (dupe)
@@ -313,10 +344,10 @@ $ tree /tmp/mylog/tile
3133449 directories, 4 files
314345```
315346
316- Since the ` 00/0000/00/00/00 ` tile is now full, its partial version have been deleted, and now
347+ Since the ` 00/0000/00/00/00 ` tile is now full, its partial versions have been deleted, and now
317348point to the full tile.
318349
319- A new tile has also appeared, one stratum above. ` 01/0000/00/00/00.01 ` . It contains a single
350+ A new tile has also appeared, one stratum above: ` 01/0000/00/00/00.01 ` . It contains a single
320351node, which is the current root node of the tree. To avoid storing duplicate hashes, this
321352top level node of the ` 00/0000/00/00/00 ` tile has been stripped, and you'll find an
322353empty line in this file:
0 commit comments