Skip to content

Commit b33ea54

Browse files
committed
Fix typo and clean other things
1 parent d5ca4d8 commit b33ea54

File tree

1 file changed

+84
-53
lines changed

1 file changed

+84
-53
lines changed

codelab.md

Lines changed: 84 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# Codelab
22

3-
This codelab will walk you trough all the steps required to build a Tiled tree.
3+
Throughout this codelab, you'll create a [Tiled tree](https://research.swtch.com/tlog#tiling_a_log).
44

55
The Tiled tree will be stored on disk using the layout described in the [layout
66
directory](api/layout/README.md). Its checkpoint uses the [checkpoint format](https://github.com/transparency-dev/formats/blob/main/log/README.md#checkpoint-format).
77

88
## Prelimiary setup
99

10-
The command-line tools in thi repository can generate tile based logs from leaf
10+
The command-line tools we'll use from this repository can generate tile based logs from leaf
1111
data stored on your file system. Each file will correspond to a single leaf in
1212
the tree.
1313

@@ -19,7 +19,7 @@ export LOG_DIR="/tmp/mylog" # where the tree will be stored
1919
export LOG_ORIGIN="My Log" # the origin of the log used by the Checkpoint format
2020
```
2121

22-
Checkpoints are signed, and we need a public/private key pair for this.
22+
Checkpoints of the log will be signed, and we need a public/private key pair for this.
2323

2424
Use the `generate_keys` command with `--key_name`, a name
2525
for the signing entity. You can output the public and private keys to files using
@@ -37,7 +37,7 @@ To create a new log state directory, use the `integrate` command with the `--ini
3737
flag, and either passing key files or with environment variables set:
3838

3939
```bash
40-
go run ./cmd/integrate --initialise --storage_dir="${LOG_DIR}" --logtostderr --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
40+
go run ./cmd/integrate --initialise --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
4141
```
4242

4343
After running this command, the log state directory looks like this:
@@ -53,12 +53,16 @@ $ tree /tmp/mylog/
5353
5454
5 directories, 1 file
5555
```
56+
- `checkpoint` contains the latest log checkpoint in the format described [here](https://github.com/transparency-dev/formats/tree/main/log).
57+
- `seq/` contains a directory hierarchy containing leaf data for each sequenced entry in the log.
58+
- `leaves/` contains files which map all known leaf hashes to their position in the log.
59+
- `tile/` contains the internal nodes of the log tree.
5660

57-
See the [layout](api/layout/README.md) documentation for an explanation of what each directory is for.
61+
See the [layout](api/layout/README.md) documentation for more details about each directory.
5862

5963
Let's look at the checkpoint content:
6064

61-
```
65+
```bash
6266
$ cat /tmp/mylog/checkpoint
6367
My Log
6468
0
@@ -67,7 +71,7 @@ My Log
6771
— astra PlUh/n54e2dSIKi6kHjea5emrGnmC7lJVDgnIfWGIJmgFqp22k0UlnUk97L2ViqrFm986NwV+wJYGnrtRPJTBV0GrA0=
6872
```
6973

70-
- `My Log` is the origin from above.
74+
- `My Log` is the origin that we defined above
7175
- `0` is the number of leaves in the tree, which currently is 0
7276
- `47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=` is the [hash of an empty slice of bytes](https://go.dev/play/p/imi_2TM6DyI), since the log is empty.
7377
- The last line is a signature over this data, using the astra private key we've generated above
@@ -76,33 +80,29 @@ My Log
7680
### Creating log content
7781
Now let's add some leaves to the log.
7882

79-
Firt, we generate the input data with:
83+
First, we generate the input data with:
8084
```bash
8185
$ mkdir $DATA_DIR
82-
$ for i in $(seq 0 3); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done;
86+
$ for i in $(seq 0 3); do x=$(printf "%03d" $i); echo "leaf_data_$x" > $DATA_DIR/leaf_$x; done;
8387
```
8488

85-
To add the contents of some files to a log, use the `sequence` command with the
86-
`--entries` flag set to a filename glob of files to add and either passing the public key
87-
file or with the environment variable set:
89+
To add the contents of these files to the log, use the `sequence` command with the
90+
`--entries` flag set to a filename glob of files to add:
8891

8992
```bash
90-
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
93+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*" --public_key=key.pub --origin="${LOG_ORIGIN}"
9194
I1221 13:16:23.940255 923589 main.go:131] 0: /tmp/files/leaf_000
9295
I1221 13:16:23.940806 923589 main.go:131] 1: /tmp/files/leaf_001
9396
I1221 13:16:23.941218 923589 main.go:131] 2: /tmp/files/leaf_002
9497
I1221 13:16:23.941673 923589 main.go:131] 3: /tmp/files/leaf_003
9598
```
9699

97-
The `sequence` commands stores data in the log directory using convenient
98-
formats. The `leaves` directory contains the leaf index of each leaf hash.
99-
Let's take the leaf at index `0`, which happens to contain `leaf_data_0`.
100-
This tree uses RFC6962's default hasher, where `leaf_hash = sha256(0x + leaf_data)`.
101-
`8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d`, the path for
102-
the leaf at index 0 with forward slashes removed is the [hexadecimal representation
103-
of this hash](https://go.dev/play/p/POnCQ7IXayk).
100+
The `sequence` commands assigns an index to each leaf, and stores data in the log directory using convenient
101+
formats.
104102

105-
```
103+
Here is what the directory looks like:
104+
105+
```bash
106106
$ grep -RH '^' /tmp/mylog/
107107
/tmp/mylog/checkpoint:My Log
108108
/tmp/mylog/checkpoint:0
@@ -119,6 +119,16 @@ $ grep -RH '^' /tmp/mylog/
119119
/tmp/mylog/seq/00/00/00/00/03:leaf_data_003
120120
```
121121

122+
The `seq` directory contains the leaves data, in files named after each leaf's index.
123+
124+
The `leaves` stores the leaf index of each leaf, in a file named after the leaf hash.
125+
Let's take the leaf at index `0`, which conveniently happens to contain `leaf_data_000`.
126+
This tree uses [RFC6962's hashing function](https://www.rfc-editor.org/rfc/rfc6962#page-4), where `leaf_hash = sha256(0x + leaf_data)`.
127+
128+
`8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d`, the path for
129+
the leaf at index 0 with forward slashes removed, is the [hexadecimal representation
130+
of this hash](https://go.dev/play/p/POnCQ7IXayk).
131+
122132
Note that at this point, no internal node of the tree has been computed, and neither
123133
has the checkpoint been updated. Leaves have only been assigned with a position
124134
in the log.
@@ -128,7 +138,7 @@ tool telling you that you're trying to add duplicate entries, along with their
128138
originally assigned sequence numbers:
129139

130140
```bash
131-
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
141+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*" --public_key=key.pub --origin="${LOG_ORIGIN}"
132142
I1221 13:18:59.735244 924268 main.go:131] 0: /tmp/files/leaf_000 (dupe)
133143
I1221 13:18:59.735362 924268 main.go:131] 1: /tmp/files/leaf_001 (dupe)
134144
I1221 13:18:59.735406 924268 main.go:131] 2: /tmp/files/leaf_002 (dupe)
@@ -137,9 +147,8 @@ I1221 13:18:59.735447 924268 main.go:131] 3: /tmp/files/leaf_003 (dupe)
137147

138148
### Integrating sequenced entries
139149

140-
We still need to update the rest of the tree structure to integrate these new entries.
141-
We use the `integrate` tool for that, again either passing key files or with the
142-
environment variables set:
150+
We still need to update the rest of the tree structure to integrate these new entries, generate the other nodes of the tree, and compute its new checkpoint.
151+
We use the `integrate` tool for that:
143152

144153
```bash
145154
$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
@@ -148,9 +157,9 @@ I1221 13:19:20.190432 924589 integrate.go:132] New log state: size 0x4 hash: 0c
148157
```
149158

150159
This output says that the integration was successful, and we now have a new log
151-
tree state which contains `0x08` entries, and has the printed log root hash.
160+
tree state which contains 4 entries, and has the printed log root hash.
152161

153-
Let's look at the contents of the tree directory:
162+
Let's look at the contents of the tree directory again:
154163

155164
```bash
156165
$ grep -RH '^' /tmp/mylog/
@@ -181,18 +190,18 @@ $ grep -RH '^' /tmp/mylog/
181190
The tile directory has been populated with a file, and the checkpoint has been updated.
182191
The `leaves/` and `seq/` directories have not changed.
183192

184-
Each tile can store a maximum of 256 leaf hashes. Since we only have 4 for now, they
185-
fit in a single file. Since it's the first tile of the tree, [its path is 00/0000/00/00/00](api/layout#tile)
193+
Each tile can store a maximum of 256 leaf hashes. Since we only have 4 leaves for now, hashes
194+
fit in a single file. Given it is the first tile of the tree, [its path is 00/0000/00/00/00](api/layout#tile)
186195
Until the tile is filed with 256 leaves, the tile is "partial",
187196
that's what the `00.04` notation means: tile `00/0000/00/00/00.04` is the partial
188197
`00/0000/00/00/00` tile with 4 leaf hashes.
189198

190-
Let's look at each line in the files:
199+
Let's look at each line of this tile file:
191200
- `32` that's the number of bytes used for hashes
192201
- `4` the number of leaf hashes in this tile
193-
- series of hashes representing the leaf hashes of the tile, and the compact range they
194-
cover
202+
- the remaining lines are a series of hashes representing the node hashes of the tile: both the leaf hashes, and internal node hashes
195203

204+
Here is what a merkle tree with 4 leaves looks like:
196205
```
197206
b
198207
/ \
@@ -203,37 +212,38 @@ Let's look at each line in the files:
203212
h0 h1 h2 h3
204213
| | | |
205214
0 1 2 3
206-
207215
```
208216

209-
We can spot the [leaves and internal node hashes](https://go.dev/play/p/6guNHqpr388) in the infix tree-traversal order.
217+
In the tile file, leaves and internal node hashes are stored in the [infix tree-traversal order](https://go.dev/play/p/eZErmZdTwdB).
210218

211219
```bash
212220
$ cat /tmp/mylog/tile/00/0000/00/00/00.04
213221
32
214222
4
215-
hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= <-- h0 = sha256(0x0 + leaf_data_0)
223+
hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= <-- h0 = sha256(0x0 + leaf_data_000)
216224
McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= <-- a = sha256(0x1 + h0 + h1)
217-
uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= <-- h1 = sha256(0x0 + leaf_data_1)
225+
uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= <-- h1 = sha256(0x0 + leaf_data_001)
218226
DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= <-- b = sha256(0x1 + a + c)
219-
bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= <-- h2 = sha256(0x0 + leaf_data_2)
227+
bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= <-- h2 = sha256(0x0 + leaf_data_002)
220228
jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= <-- c = sha(0x1 + h2 + h3)
221-
4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= <-- h3 = sha256(0x0 + leaf_data_3)
229+
4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= <-- h3 = sha256(0x0 + leaf_data_003)
222230
```
223231

224232
### Adding one more leaf
233+
Let's add one more leaf to our tree.
234+
225235
```bash
226-
$ echo "leaf_data_004" > /tmp/files/leaf_004
236+
$ echo "leaf_data_004" > $DATA_DIR/leaf_004
227237

228-
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/leaf_004' --public_key=key.pub --origin="${LOG_ORIGIN}"
238+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/leaf_004" --public_key=key.pub --origin="${LOG_ORIGIN}"
229239
I1221 13:23:43.956356 926120 main.go:131] 4: /tmp/files/leaf_004
230240

231241
$ go run ./cmd/integrate --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
232242
I1221 13:24:11.168864 926446 integrate.go:94] Loaded state with roothash 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f
233243
I1221 13:24:11.169036 926446 integrate.go:132] New log state: size 0x5 hash: 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496
234244
```
235245

236-
This adds matchin files in `seq`, `leaves`, and updates the checkcpoint, as expected.
246+
This adds matching files in `seq`, `leaves`, and updates the checkpoint, as expected.
237247
A new tile is availble under `00/0000/00/00/00/00.05`:
238248

239249
```bash
@@ -248,9 +258,9 @@ $ tree /tmp/mylog/tile
248258
5 directories, 2 files
249259
```
250260

251-
Notice that the old tile, `00.04` has not been deleted.
261+
Notice that the old tile file, `00.04` has not been deleted.
252262

253-
Here's the diff between the two leaves:
263+
Here's the diff between the two tiles:
254264

255265
```bash
256266
$ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
@@ -263,19 +273,40 @@ $ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
263273
> 6KUzDe4gX/0rZTZCgfgBtaIGOBkOQz4duxjTT+NeM5w=
264274
```
265275

266-
The number of leaves `4` has been updated to `5`, and a new leaf node hash has appeared.
267-
Note that even though the tree has changed shape to include this new leaf, no internal
268-
node was added to the tile. That's because tiles only store non-emphemeral node, and in this
269-
case, all the new interanl nodes are ephemeral: they will change when new leaves are added to
270-
the tree.
276+
The number of leaves `4` has been updated to `5`, and a new leaf node hash has
277+
appeared. Note that even though the tree has changed shape to include this new
278+
leaf, no internal node was added to the tile. That's because tiles only store
279+
non-emphemeral node, and in this case, all the new internal nodes are ephemeral
280+
(marked with a prime symbol): they will change when new leaves are added to the
281+
tree.
282+
283+
```
284+
f'
285+
/ \
286+
/ \
287+
/ \
288+
/ \
289+
/ \
290+
/ \
291+
/ \
292+
b e'
293+
/ \ / \
294+
/ \ / \
295+
/ \ / \
296+
a c d' X
297+
/ \ / \ / \
298+
h0 h1 h2 h3 h4 X
299+
| | | | |
300+
0 1 2 3 4
301+
```
271302

272303
### Filling up the tile
273-
Let's fill up the tile, with 256 entries:
304+
Now, let's fill up the tile, with the maximum number of leaves it can hold: 256.
274305

275306
```bash
276-
$ for i in $(seq 5 255); do x=$(printf "%03d" $i); echo "leaf_data_$x" > /tmp/files/leaf_$x; done;
307+
$ for i in $(seq 5 255); do x=$(printf "%03d" $i); echo "leaf_data_$x" > $DATA_DIR/leaf_$x; done;
277308

278-
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries '/tmp/files/*' --public_key=key.pub --origin="${LOG_ORIGIN}"
309+
$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*" --public_key=key.pub --origin="${LOG_ORIGIN}"
279310
I1221 13:26:19.752225 927458 main.go:131] 0: /tmp/files/leaf_000 (dupe)
280311
I1221 13:26:19.752350 927458 main.go:131] 1: /tmp/files/leaf_001 (dupe)
281312
I1221 13:26:19.752398 927458 main.go:131] 2: /tmp/files/leaf_002 (dupe)
@@ -313,10 +344,10 @@ $ tree /tmp/mylog/tile
313344
9 directories, 4 files
314345
```
315346

316-
Since the `00/0000/00/00/00` tile is now full, its partial version have been deleted, and now
347+
Since the `00/0000/00/00/00` tile is now full, its partial versions have been deleted, and now
317348
point to the full tile.
318349

319-
A new tile has also appeared, one stratum above. `01/0000/00/00/00.01`. It contains a single
350+
A new tile has also appeared, one stratum above: `01/0000/00/00/00.01`. It contains a single
320351
node, which is the current root node of the tree. To avoid storing duplicate hashes, this
321352
top level node of the `00/0000/00/00/00` tile has been stripped, and you'll find an
322353
empty line in this file:

0 commit comments

Comments
 (0)