Skip to content

Commit 659e63c

Browse files
authored
Merge pull request #34 from edilmedeiros/edilmedeiros-chap12
Edit chapter 12
2 parents fcb64c7 + e95b2ff commit 659e63c

File tree

1 file changed

+71
-24
lines changed

1 file changed

+71
-24
lines changed

Diff for: 12_reading_inputs_and_type_coercion.md

+71-24
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,20 @@
11
# Reading Inputs and Type Coercion
22

33
Each input in a transaction contains the following information:
4-
* previous output txid (32 bytes length)
5-
* previous output index (4 bytes length, represented as a u32 integer)
6-
* ScriptSig (variable length preceded by compact size integer)
7-
* sequence (4 bytes length, represented as a u32 integer)
4+
* the previous output txid (32 bytes length)
5+
* the previous output index (4 bytes length, represented as a u32 integer)
6+
* a ScriptSig (variable length preceded by compact size integer)
7+
* a sequence number (4 bytes length, represented as a u32 integer).
88

9-
The ScriptSig can be a variable length and so is preceded by a compact size integer which indicates the length of the field in bytes. Prior to Segwit, the ScriptSig was where the signature would be provided for unlocking the funds of the referenced output (as indicated by the previous output txid and previous output index). Now, for Segwit transactions, this field is empty with a compact size length of 0x00 as the signature is no longer contained in the input data, but is instead "*segregated*" from the rest of the transaction in a separate witness field. For more information on SegWit, see [this section](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#segregated-witness) from Mastering Bitcoin, Chapter 6.
9+
The ScriptSig can be have variable length and so is preceded by a compact size integer which indicates the length of the field in bytes.
10+
Prior to Segwit, the ScriptSig was where a digital signature would be provided for unlocking the funds of the referenced output (as indicated by the previous output txid and previous output index).
11+
Now, for Segwit transactions, this field is empty with a compact size length of 0x00 as the signature is no longer contained in the input data, but is instead "*segregated*" from the rest of the transaction in a separate witness field.
12+
For more information on SegWit, see [this section](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#segregated-witness) from Mastering Bitcoin, Chapter 6.
1013

11-
So let's update our code. We have the input length now so we know how many times to read the input information. We'll start by using a for loop and iterate over a range. Since we don't need the range index number, we can just replace the unused variable with an underscore, `_`. More details on loops in Rust can be [found here](https://doc.rust-lang.org/book/ch03-05-control-flow.html#looping-through-a-collection-with-for).
14+
We already have the input length so we know how many times to read the input information.
15+
We'll start by using a for loop and iterate over a range.
16+
Since we don't need the range index number, we can just replace the unused variable with an underscore, `_`.
17+
More details on loops in Rust can be [found here](https://doc.rust-lang.org/book/ch03-05-control-flow.html#looping-through-a-collection-with-for).
1218

1319
```rust
1420
fn main() {
@@ -24,12 +30,23 @@ fn main() {
2430
}
2531
```
2632

27-
Let's implement the `read_txid` function. We know we're looking to read the next 32 bytes, which displayed in hex format will be the transaction id we show to a user. There's one catch, however. Whenever we display transaction ids to users, we display them in *big-endian* format. However, those ids are stored internally in blocks as *little-endian*. This means we have to reverse the id before showing it to the user. A description about why this is can be [found here](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#internal_and_display_order) in Mastering Bitcoin, Chapter 6.
33+
Let's implement the `read_txid` function.
34+
We know we're looking to read the next 32 bytes, which displayed in hex format will be the transaction id we show to a user.
35+
There's one catch, however.
36+
Whenever we display transaction ids to users, we display them in *big-endian* format.
37+
However, those ids are stored internally in blocks as *little-endian*.
38+
This means we have to reverse the id before showing it to the user.
39+
A description about why this is can be [found here](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#internal_and_display_order) in Mastering Bitcoin, Chapter 6.
2840

29-
Before we show the implementation below, why don't you take a stab at the function signature? If you're feeling confident, write out the whole function and then compare your answer here.
41+
Before we show the implementation below, why don't you take a stab at the function signature?
42+
If you're feeling confident, write out the whole function and then compare your answer here.
3043
Some hints:
31-
1. Let's not worry about the hex display just yet. Let's just return the appropriate bytes in *big endian*. Remember, by default it is *little-endian*.
32-
2. *We know the exact size of the amount of bytes we want to return, so what kind of data structure is appropriate for a fixed-size amount of bytes? Do we need to store anything on the heap? Or can we just store this data on the stack?*
44+
1. Let's not worry about the hex display just yet.
45+
Let's just return the appropriate bytes in *big endian*.
46+
Remember, by default it is *little-endian*.
47+
2. *We know the exact size of the amount of bytes we want to return, so what kind of data structure is appropriate for a fixed-size amount of bytes?
48+
Do we need to store anything on the heap?
49+
Or can we just store this data on the stack?*
3350

3451
<hr/>
3552

@@ -42,9 +59,14 @@ fn read_txid(transaction_bytes: &mut &[u8]) -> [u8; 32] {
4259
}
4360
```
4461

45-
So all we have to do here is read 32 bytes and store that into an array, which is a fixed size amount of bytes. Of course, we need to reverse the bytes so that they are in big-endian. Pretty simple right? So what's next? The next 4 bytes will give us the index of that transaction that we're spending.
62+
So all we have to do here is read 32 bytes and store that into an array, which is a fixed size amount of bytes.
63+
Of course, we need to reverse the bytes so that they are in big-endian.
64+
Pretty simple right? So what's next? The next 4 bytes will give us the index of that transaction that we're spending.
4665

47-
If you think about it, this is identical to what we did to get the version. We read 4 bytes and returned the u32 integer which represented the version. The index is the same. So perhaps instead of calling that function `read_version`, we can rename it to `read_u32` to make it more generic and just call that here:
66+
If you think about it, this is identical to what we did to get the version.
67+
We read 4 bytes and returned the u32 integer which represented the version.
68+
The index is the same.
69+
So perhaps instead of calling that function `read_version`, we can rename it to `read_u32` to make it more generic and just call that here:
4870

4971
```rust
5072
...
@@ -78,21 +100,42 @@ Next let's get the size of our ScriptSig by reading the compactSize.
78100

79101
### Type Coercions
80102

81-
Now that we have the `script_size`, we know how many bytes to read, but this gets us into an interesting problem. When we create a buffer to read bytes into, we always have to provide a fixed size array. However, the `script_size` is dynamic and cannot be known at compile time. It can only be determined at runtime. You're not able to do something like this as the compiler will complain:
103+
Now that we have the `script_size`, we know how many bytes to read, but this gets us into an interesting problem.
104+
When we create a buffer to read bytes into, we always have to provide a fixed size array.
105+
However, the `script_size` is dynamic and cannot be known at compile time.
106+
It can only be determined at runtime.
107+
You're not able to do something like this as the compiler will complain:
82108

83109
```rust
84110
let mut buffer = [0; script_size];
85111
```
86112

87-
Let's look closer at the `read` method and what type of argument it accepts. If we look at the documentation, it accepts the argument of type `&mut [u8]`. https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read
88-
89-
This is interesting. Technically, it only accepts a mutable reference to a slice. But we've actually been passing in a mutable reference to an array! Remember an array is a fixed size of type `[u8; n]` and not a slice of type `[u8]`. So how has this been working at all? I thought we had to be explicit with types in Rust?
90-
91-
Well, under the hood, Rust is making an implicit conversion. It does this in a few different cases. In the case of an array, there is something known as an **Unsized Coercion**, in which it will automatically convert a sized type (such as an array, `[T; n]`) into an unsized type (a slice, `[T]`).
92-
93-
There is also something known as a **Deref Coercion**, which we can take advantage of here and which is something we alluded to in chapter 9. Basically, if a type implements the `Deref` trait, Rust will implicitly call the `deref` method on it until it gets the type that matches the argument's required type.
94-
95-
So going back to reading our script, what we want is a dynamically-sized buffer to read into. A vector would work just fine. But can we use it? Can we pass it into the `read` method as an argument? It turns out we can! In Rust, a Vec [implements the `DerefMut`](https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#2769) trait which dereferences to a slice. So we can initialize a `Vec` filled with 0s of the size of the script and then pass that into the `read` method as a mutable reference (`&mut Vec<u8>`). It will then be dereferenced to a slice and match the correct argument type, which is `&mut [u8]`.
113+
Let's look closer at the `read` method and what type of argument it accepts.
114+
If we look at the documentation, it accepts the argument of type `&mut [u8]`.
115+
https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read
116+
117+
This is interesting.
118+
Technically, it only accepts a mutable reference to a slice.
119+
But we've actually been passing in a mutable reference to an array!
120+
Remember an array is a fixed size of type `[u8; n]` and not a slice of type `[u8]`.
121+
So how has this been working at all?
122+
I thought we had to be explicit with types in Rust?
123+
124+
Well, under the hood, Rust is making an implicit conversion.
125+
It does this in a few different cases.
126+
In the case of an array, there is something known as an **Unsized Coercion**, in which it will automatically convert a sized type (such as an array, `[T; n]`) into an unsized type (a slice, `[T]`).
127+
128+
There is also something known as a **Deref Coercion**, which we can take advantage of here and which is something we alluded to in chapter 9.
129+
Basically, if a type implements the `Deref` trait, Rust will implicitly call the `deref` method on it until it gets the type that matches the argument's required type.
130+
131+
So going back to reading our script, what we want is a dynamically-sized buffer to read into.
132+
A vector would work just fine.
133+
But can we use it?
134+
Can we pass it into the `read` method as an argument?
135+
It turns out we can!
136+
In Rust, a Vec [implements the `DerefMut`](https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#2769) trait which dereferences to a slice.
137+
So we can initialize a `Vec` filled with 0s of the size of the script and then pass that into the `read` method as a mutable reference (`&mut Vec<u8>`).
138+
It will then be dereferenced to a slice and match the correct argument type, which is `&mut [u8]`.
96139

97140
We'll create a new function called `read_script` which will return a `Vec<u8>`:
98141

@@ -105,7 +148,8 @@ fn read_script(transaction_bytes: &mut &[u8]) -> Vec<u8> {
105148
}
106149
```
107150

108-
Lastly, we need to read the sequence, which are the last 4 bytes. A description of what the sequence number represents can be found in Mastering Bitcoin, Chapter 6.
151+
Lastly, we need to read the the last 4 bytes for the sequence number.
152+
A description of what the sequence number represents can be found in Mastering Bitcoin, Chapter 6.
109153

110154
```rust
111155
...
@@ -118,7 +162,10 @@ Lastly, we need to read the sequence, which are the last 4 bytes. A description
118162
...
119163
```
120164

121-
Alright, now that we have each of the components of an input, what should we do with it? It makes sense to collect all this data together into one unified structure rather than just separate variables. The right type for this is Rust's `Struct` type, which we'll explore in the next lesson. Onwards!
165+
Alright, now that we have each of the components of an input, what should we do with it?
166+
It makes sense to collect all this data together into one unified structure rather than just separate variables.
167+
The right type for this is Rust's `Struct` type, which we'll explore in the next lesson.
168+
Onwards!
122169

123170
### Additional Reading
124171
* Implicit Deref Coercions: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods

0 commit comments

Comments
 (0)