Skip to content

Commit 3293b7b

Browse files
committed
Break long paragraphs lines
1 parent 8f0c2c8 commit 3293b7b

File tree

1 file changed

+46
-15
lines changed

1 file changed

+46
-15
lines changed

10_compact_size_unsigned_integers.md

Lines changed: 46 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,35 @@
11
# Compact Size Unsigned Integers
22

3-
We'll talk more about the Segwit soft fork and how the transaction format changed later on in this course. For now, we're going to assume the transactions we're decoding are serialized according to the legacy, pre-segwit format. This means the next field after the version will be the number of inputs.
3+
We'll talk more about the Segwit soft fork and how the transaction format changed later on in this course.
4+
For now, we're going to assume the transactions we're decoding are serialized according to the legacy, pre-segwit format.
5+
This means the next field after the version will be the number of inputs.
46

5-
If you have read [Mastering Bitcoin 3rd Edition, Chapter 6](https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch06_transactions.adoc#length-of-transaction-input-list), you'll remember that the next byte represents the length of the transaction input list encoded as a compactSize usigned integer. The compactSize integer indicates how many bytes to read to determine the number of inputs. For example, if the length is less than 253, then the next byte is simply interpreted as an unsigned 8-bit integer (the `u8` data type in Rust). If the length is greater than 252 and less than 2^16, then we would expect to see the byte `fd` (or the integer 253) followed by two additional bytes interpreted as a `u16` integer, etc. This is the table we can use as reference:
7+
If you have read [Mastering Bitcoin 3rd Edition, Chapter 6](https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch06_transactions.adoc#length-of-transaction-input-list), you'll remember that the next byte represents the length of the transaction input list encoded as a compactSize usigned integer.
8+
The compactSize integer indicates how many bytes to read to determine the number of inputs.
9+
For example, if the length is less than 253, then the next byte is simply interpreted as an unsigned 8-bit integer (the `u8` data type in Rust).
10+
If the length is greater than 252 and less than 2^16, then we would expect to see the byte `fd` (or the integer 253) followed by two additional bytes interpreted as a `u16` integer, etc.
11+
This is the table we can use as reference:
612

713
![Compact Size Unsigned Integer Type](images/compactSize.png)
814

9-
So let's write a function to read a compactSize unsigned integer. Let's think about this a bit. What kind of argument do we want to accept? And what should the return type be? Take a moment to fill out the function signature and come back.
15+
So let's write a function to read a compactSize unsigned integer.
16+
Let's think about this a bit.
17+
What kind of argument do we want to accept? And what should the return type be? Take a moment to fill out the function signature and come back.
1018

1119
<hr/>
1220

13-
For the argument type, we have to remember that we're still passing around the same mutable reference to the slice so that we can keep reading it and moving the pointer. So we'll keep the same argument type as in the `read_version` function.
21+
For the argument type, we have to remember that we're still passing around the same mutable reference to the slice so that we can keep reading it and moving the pointer.
22+
So we'll keep the same argument type as in the `read_version` function.
1423

15-
Now, what should the return type be? Well, the input length can be an 8-bit, 16-bit, 32-bit or a 64-bit unsigned integer? So if we need to specify just one type for the length, let's choose the highest one as it will contain any other possibility. `fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64`
24+
Now, what should the return type be? Well, the input length can be an 8-bit, 16-bit, 32-bit or a 64-bit unsigned integer? So if we need to specify just one type for the length, let's choose the highest one as it will contain any other possibility.
25+
`fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64`
1626

17-
From here, it is fairly straightforward if/else logic. As the chart above shows in the Format column, we can tell how many bytes to read based on the byte value. If it is less than 253, then the byte is the length. If it is equal to 254, then we need to read the next two bytes. If it is equal to 255, then we need to read the next three bytes and so on. So let's implement this using a standard if/else statement block which you're probably familiar with.
27+
From here, it is fairly straightforward if/else logic.
28+
As the chart above shows in the Format column, we can tell how many bytes to read based on the byte value.
29+
If it is less than 253, then the byte is the length.
30+
If it is equal to 254, then we need to read the next two bytes.
31+
If it is equal to 255, then we need to read the next three bytes and so on.
32+
So let's implement this using a standard if/else statement block which you're probably familiar with.
1833

1934
```rust
2035
fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64 {
@@ -41,12 +56,17 @@ fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64 {
4156

4257
A few things to point out here:
4358
1. `0..253` syntax is a [range type](https://doc.rust-lang.org/std/ops/struct.Range.html#), which has a method called `contains` to check if a value is in the given range.
44-
2. The number of bytes read match the integer type. For example, 2 bytes give us a `u16` type. 4 bytes give us a `u32` type.
45-
3. We **cast** each type into a `u64`. We can convert between primitive types in Rust using the [`as` keyword](https://doc.rust-lang.org/std/keyword.as.html).
46-
4. Notice how there are are no semicolons for each ending line, such as `u32::from_le_bytes(buffer) as u64`. This is the equivalent of returning that value from the function. We could also write it as `return u32::from_le_bytes(buffer) as u64;` but implicit return without semicolon is more idiomatic.
59+
2. The number of bytes read match the integer type.
60+
For example, 2 bytes give us a `u16` type, and 4 bytes give us a `u32` type.
61+
3. We **cast** each type into a `u64`.
62+
We can convert between primitive types in Rust using the [`as` keyword](https://doc.rust-lang.org/std/keyword.as.html).
63+
4. Notice how there are are no semicolons for each ending line, such as `u32::from_le_bytes(buffer) as u64`.
64+
This is the equivalent of returning that value from the function. We could also write it as `return u32::from_le_bytes(buffer) as u64;` but implicit return without semicolon is more idiomatic.
4765

48-
We're going to make one more change. While standard if/else statements work fine, Rust provides pattern matching via the `match` keyword and this is a good opportunity to use it as it is commonly used in Rust codebases. https://doc.rust-lang.org/book/ch06-02-match.html
4966

67+
We're going to make one more change.
68+
While standard if/else statements work fine, Rust provides pattern matching via the `match` keyword and this is a good opportunity to use it as it is commonly used in Rust codebases.
69+
https://doc.rust-lang.org/book/ch06-02-match.html
5070
```rust
5171
fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64 {
5272
let mut compact_size = [0; 1];
@@ -73,11 +93,17 @@ fn read_compact_size(transaction_bytes: &mut &[u8]) -> u64 {
7393
}
7494
```
7595

76-
What do you think? The `match` looks nicer doesn't it? Take a moment to get familiar with the syntax. Each of the `arm`'s has a pattern to match followed by `=>` and then some code to return for that given pattern.
96+
What do you think?
97+
The `match` looks nicer doesn't it?
98+
Take a moment to get familiar with the syntax.
99+
Each of the `arm`'s has a pattern to match followed by `=>` and then some code to return for that given pattern.
77100

78-
We sometimes see an arm with the underscore symbol (`_` ) as the pattern to match. This represents a catchall pattern that will capture any value not already covered by the previous arms. However, in our case, this is not needed since the previous arms are exhaustive and capture all the possible scenarios. Remember a `u8` can only have a value between `0` and `255`.
101+
We sometimes see an arm with the underscore symbol (`_` ) as the pattern to match.
102+
This represents a catchall pattern that will capture any value not already covered by the previous arms.
103+
However, in our case, this is not needed since the previous arms are exhaustive and capture all the possible scenarios.
104+
Remember a `u8` can only have a value between `0` and `255`.
79105

80-
Now all we have to do is update our `main` function to call this and return the number of inputs.
106+
Now all we have to do is update our `main` function to call this and return the number of inputs.
81107

82108
```rust
83109
fn main() {
@@ -99,10 +125,15 @@ Version: 1
99125
Input Length: 2
100126
```
101127

102-
Pretty neat! We're making good progress. But even though our code compiles, how can we be sure we've written it correctly and that this function will return the appropriate number of inputs for different transactions? We want to test it with different arguments and ensure it is returning the correct compactSize. We can do this with unit testing. So let's look into setting up our first unit test in the next section.
128+
Pretty neat! We're making good progress.
129+
But even though our code compiles, how can we be sure we've written it correctly and that this function will return the appropriate number of inputs for different transactions?
130+
We want to test it with different arguments and ensure it is returning the correct compactSize.
131+
We can do this with unit testing.
132+
So let's look into setting up our first unit test in the next section.
103133

104134
### Quiz
105-
*How do nodes know whether the transaction is a legacy or a segwit transaction as they read it? How do they know whether to view the next field after the version as an input length encoded as compactSize or as the marker and flag for a Segwit transaction?*
135+
*How do nodes know whether the transaction is a legacy or a segwit transaction as they read it?
136+
How do they know whether to view the next field after the version as an input length encoded as compactSize or as the marker and flag for a Segwit transaction?*
106137

107138
<hr/>
108139

0 commit comments

Comments
 (0)