You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 10_compact_size_unsigned_integers.md
+46-15Lines changed: 46 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,35 @@
1
1
# Compact Size Unsigned Integers
2
2
3
-
We'll talk more about the Segwit soft fork and how the transaction format changed later on in this course. For now, we're going to assume the transactions we're decoding are serialized according to the legacy, pre-segwit format. This means the next field after the version will be the number of inputs.
3
+
We'll talk more about the Segwit soft fork and how the transaction format changed later on in this course.
4
+
For now, we're going to assume the transactions we're decoding are serialized according to the legacy, pre-segwit format.
5
+
This means the next field after the version will be the number of inputs.
4
6
5
-
If you have read [Mastering Bitcoin 3rd Edition, Chapter 6](https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch06_transactions.adoc#length-of-transaction-input-list), you'll remember that the next byte represents the length of the transaction input list encoded as a compactSize usigned integer. The compactSize integer indicates how many bytes to read to determine the number of inputs. For example, if the length is less than 253, then the next byte is simply interpreted as an unsigned 8-bit integer (the `u8` data type in Rust). If the length is greater than 252 and less than 2^16, then we would expect to see the byte `fd` (or the integer 253) followed by two additional bytes interpreted as a `u16` integer, etc. This is the table we can use as reference:
7
+
If you have read [Mastering Bitcoin 3rd Edition, Chapter 6](https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch06_transactions.adoc#length-of-transaction-input-list), you'll remember that the next byte represents the length of the transaction input list encoded as a compactSize usigned integer.
8
+
The compactSize integer indicates how many bytes to read to determine the number of inputs.
9
+
For example, if the length is less than 253, then the next byte is simply interpreted as an unsigned 8-bit integer (the `u8` data type in Rust).
10
+
If the length is greater than 252 and less than 2^16, then we would expect to see the byte `fd` (or the integer 253) followed by two additional bytes interpreted as a `u16` integer, etc.
So let's write a function to read a compactSize unsigned integer. Let's think about this a bit. What kind of argument do we want to accept? And what should the return type be? Take a moment to fill out the function signature and come back.
15
+
So let's write a function to read a compactSize unsigned integer.
16
+
Let's think about this a bit.
17
+
What kind of argument do we want to accept? And what should the return type be? Take a moment to fill out the function signature and come back.
10
18
11
19
<hr/>
12
20
13
-
For the argument type, we have to remember that we're still passing around the same mutable reference to the slice so that we can keep reading it and moving the pointer. So we'll keep the same argument type as in the `read_version` function.
21
+
For the argument type, we have to remember that we're still passing around the same mutable reference to the slice so that we can keep reading it and moving the pointer.
22
+
So we'll keep the same argument type as in the `read_version` function.
14
23
15
-
Now, what should the return type be? Well, the input length can be an 8-bit, 16-bit, 32-bit or a 64-bit unsigned integer? So if we need to specify just one type for the length, let's choose the highest one as it will contain any other possibility. `fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64`
24
+
Now, what should the return type be? Well, the input length can be an 8-bit, 16-bit, 32-bit or a 64-bit unsigned integer? So if we need to specify just one type for the length, let's choose the highest one as it will contain any other possibility.
From here, it is fairly straightforward if/else logic. As the chart above shows in the Format column, we can tell how many bytes to read based on the byte value. If it is less than 253, then the byte is the length. If it is equal to 254, then we need to read the next two bytes. If it is equal to 255, then we need to read the next three bytes and so on. So let's implement this using a standard if/else statement block which you're probably familiar with.
27
+
From here, it is fairly straightforward if/else logic.
28
+
As the chart above shows in the Format column, we can tell how many bytes to read based on the byte value.
29
+
If it is less than 253, then the byte is the length.
30
+
If it is equal to 254, then we need to read the next two bytes.
31
+
If it is equal to 255, then we need to read the next three bytes and so on.
32
+
So let's implement this using a standard if/else statement block which you're probably familiar with.
1.`0..253` syntax is a [range type](https://doc.rust-lang.org/std/ops/struct.Range.html#), which has a method called `contains` to check if a value is in the given range.
44
-
2. The number of bytes read match the integer type. For example, 2 bytes give us a `u16` type. 4 bytes give us a `u32` type.
45
-
3. We **cast** each type into a `u64`. We can convert between primitive types in Rust using the [`as` keyword](https://doc.rust-lang.org/std/keyword.as.html).
46
-
4. Notice how there are are no semicolons for each ending line, such as `u32::from_le_bytes(buffer) as u64`. This is the equivalent of returning that value from the function. We could also write it as `return u32::from_le_bytes(buffer) as u64;` but implicit return without semicolon is more idiomatic.
59
+
2. The number of bytes read match the integer type.
60
+
For example, 2 bytes give us a `u16` type, and 4 bytes give us a `u32` type.
61
+
3. We **cast** each type into a `u64`.
62
+
We can convert between primitive types in Rust using the [`as` keyword](https://doc.rust-lang.org/std/keyword.as.html).
63
+
4. Notice how there are are no semicolons for each ending line, such as `u32::from_le_bytes(buffer) as u64`.
64
+
This is the equivalent of returning that value from the function. We could also write it as `return u32::from_le_bytes(buffer) as u64;` but implicit return without semicolon is more idiomatic.
47
65
48
-
We're going to make one more change. While standard if/else statements work fine, Rust provides pattern matching via the `match` keyword and this is a good opportunity to use it as it is commonly used in Rust codebases. https://doc.rust-lang.org/book/ch06-02-match.html
49
66
67
+
We're going to make one more change.
68
+
While standard if/else statements work fine, Rust provides pattern matching via the `match` keyword and this is a good opportunity to use it as it is commonly used in Rust codebases.
What do you think? The `match` looks nicer doesn't it? Take a moment to get familiar with the syntax. Each of the `arm`'s has a pattern to match followed by `=>` and then some code to return for that given pattern.
96
+
What do you think?
97
+
The `match` looks nicer doesn't it?
98
+
Take a moment to get familiar with the syntax.
99
+
Each of the `arm`'s has a pattern to match followed by `=>` and then some code to return for that given pattern.
77
100
78
-
We sometimes see an arm with the underscore symbol (`_` ) as the pattern to match. This represents a catchall pattern that will capture any value not already covered by the previous arms. However, in our case, this is not needed since the previous arms are exhaustive and capture all the possible scenarios. Remember a `u8` can only have a value between `0` and `255`.
101
+
We sometimes see an arm with the underscore symbol (`_` ) as the pattern to match.
102
+
This represents a catchall pattern that will capture any value not already covered by the previous arms.
103
+
However, in our case, this is not needed since the previous arms are exhaustive and capture all the possible scenarios.
104
+
Remember a `u8` can only have a value between `0` and `255`.
79
105
80
-
Now all we have to do is update our `main` function to call this and return the number of inputs.
106
+
Now all we have to do is update our `main` function to call this and return the number of inputs.
81
107
82
108
```rust
83
109
fnmain() {
@@ -99,10 +125,15 @@ Version: 1
99
125
Input Length: 2
100
126
```
101
127
102
-
Pretty neat! We're making good progress. But even though our code compiles, how can we be sure we've written it correctly and that this function will return the appropriate number of inputs for different transactions? We want to test it with different arguments and ensure it is returning the correct compactSize. We can do this with unit testing. So let's look into setting up our first unit test in the next section.
128
+
Pretty neat! We're making good progress.
129
+
But even though our code compiles, how can we be sure we've written it correctly and that this function will return the appropriate number of inputs for different transactions?
130
+
We want to test it with different arguments and ensure it is returning the correct compactSize.
131
+
We can do this with unit testing.
132
+
So let's look into setting up our first unit test in the next section.
103
133
104
134
### Quiz
105
-
*How do nodes know whether the transaction is a legacy or a segwit transaction as they read it? How do they know whether to view the next field after the version as an input length encoded as compactSize or as the marker and flag for a Segwit transaction?*
135
+
*How do nodes know whether the transaction is a legacy or a segwit transaction as they read it?
136
+
How do they know whether to view the next field after the version as an input length encoded as compactSize or as the marker and flag for a Segwit transaction?*
0 commit comments