You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 12_reading_inputs_and_type_coercion.md
+71-24
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,20 @@
1
1
# Reading Inputs and Type Coercion
2
2
3
3
Each input in a transaction contains the following information:
4
-
* previous output txid (32 bytes length)
5
-
* previous output index (4 bytes length, represented as a u32 integer)
6
-
* ScriptSig (variable length preceded by compact size integer)
7
-
* sequence (4 bytes length, represented as a u32 integer)
4
+
*the previous output txid (32 bytes length)
5
+
*the previous output index (4 bytes length, represented as a u32 integer)
6
+
*a ScriptSig (variable length preceded by compact size integer)
7
+
*a sequence number (4 bytes length, represented as a u32 integer).
8
8
9
-
The ScriptSig can be a variable length and so is preceded by a compact size integer which indicates the length of the field in bytes. Prior to Segwit, the ScriptSig was where the signature would be provided for unlocking the funds of the referenced output (as indicated by the previous output txid and previous output index). Now, for Segwit transactions, this field is empty with a compact size length of 0x00 as the signature is no longer contained in the input data, but is instead "*segregated*" from the rest of the transaction in a separate witness field. For more information on SegWit, see [this section](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#segregated-witness) from Mastering Bitcoin, Chapter 6.
9
+
The ScriptSig can be have variable length and so is preceded by a compact size integer which indicates the length of the field in bytes.
10
+
Prior to Segwit, the ScriptSig was where a digital signature would be provided for unlocking the funds of the referenced output (as indicated by the previous output txid and previous output index).
11
+
Now, for Segwit transactions, this field is empty with a compact size length of 0x00 as the signature is no longer contained in the input data, but is instead "*segregated*" from the rest of the transaction in a separate witness field.
12
+
For more information on SegWit, see [this section](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#segregated-witness) from Mastering Bitcoin, Chapter 6.
10
13
11
-
So let's update our code. We have the input length now so we know how many times to read the input information. We'll start by using a for loop and iterate over a range. Since we don't need the range index number, we can just replace the unused variable with an underscore, `_`. More details on loops in Rust can be [found here](https://doc.rust-lang.org/book/ch03-05-control-flow.html#looping-through-a-collection-with-for).
14
+
We already have the input length so we know how many times to read the input information.
15
+
We'll start by using a for loop and iterate over a range.
16
+
Since we don't need the range index number, we can just replace the unused variable with an underscore, `_`.
17
+
More details on loops in Rust can be [found here](https://doc.rust-lang.org/book/ch03-05-control-flow.html#looping-through-a-collection-with-for).
12
18
13
19
```rust
14
20
fnmain() {
@@ -24,12 +30,23 @@ fn main() {
24
30
}
25
31
```
26
32
27
-
Let's implement the `read_txid` function. We know we're looking to read the next 32 bytes, which displayed in hex format will be the transaction id we show to a user. There's one catch, however. Whenever we display transaction ids to users, we display them in *big-endian* format. However, those ids are stored internally in blocks as *little-endian*. This means we have to reverse the id before showing it to the user. A description about why this is can be [found here](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#internal_and_display_order) in Mastering Bitcoin, Chapter 6.
33
+
Let's implement the `read_txid` function.
34
+
We know we're looking to read the next 32 bytes, which displayed in hex format will be the transaction id we show to a user.
35
+
There's one catch, however.
36
+
Whenever we display transaction ids to users, we display them in *big-endian* format.
37
+
However, those ids are stored internally in blocks as *little-endian*.
38
+
This means we have to reverse the id before showing it to the user.
39
+
A description about why this is can be [found here](https://github.com/bitcoinbook/bitcoinbook/blob/6d1c26e1640ae32b28389d5ae4caf1214c2be7db/ch06_transactions.adoc#internal_and_display_order) in Mastering Bitcoin, Chapter 6.
28
40
29
-
Before we show the implementation below, why don't you take a stab at the function signature? If you're feeling confident, write out the whole function and then compare your answer here.
41
+
Before we show the implementation below, why don't you take a stab at the function signature?
42
+
If you're feeling confident, write out the whole function and then compare your answer here.
30
43
Some hints:
31
-
1. Let's not worry about the hex display just yet. Let's just return the appropriate bytes in *big endian*. Remember, by default it is *little-endian*.
32
-
2.*We know the exact size of the amount of bytes we want to return, so what kind of data structure is appropriate for a fixed-size amount of bytes? Do we need to store anything on the heap? Or can we just store this data on the stack?*
44
+
1. Let's not worry about the hex display just yet.
45
+
Let's just return the appropriate bytes in *big endian*.
46
+
Remember, by default it is *little-endian*.
47
+
2.*We know the exact size of the amount of bytes we want to return, so what kind of data structure is appropriate for a fixed-size amount of bytes?
So all we have to do here is read 32 bytes and store that into an array, which is a fixed size amount of bytes. Of course, we need to reverse the bytes so that they are in big-endian. Pretty simple right? So what's next? The next 4 bytes will give us the index of that transaction that we're spending.
62
+
So all we have to do here is read 32 bytes and store that into an array, which is a fixed size amount of bytes.
63
+
Of course, we need to reverse the bytes so that they are in big-endian.
64
+
Pretty simple right? So what's next? The next 4 bytes will give us the index of that transaction that we're spending.
46
65
47
-
If you think about it, this is identical to what we did to get the version. We read 4 bytes and returned the u32 integer which represented the version. The index is the same. So perhaps instead of calling that function `read_version`, we can rename it to `read_u32` to make it more generic and just call that here:
66
+
If you think about it, this is identical to what we did to get the version.
67
+
We read 4 bytes and returned the u32 integer which represented the version.
68
+
The index is the same.
69
+
So perhaps instead of calling that function `read_version`, we can rename it to `read_u32` to make it more generic and just call that here:
48
70
49
71
```rust
50
72
...
@@ -78,21 +100,42 @@ Next let's get the size of our ScriptSig by reading the compactSize.
78
100
79
101
### Type Coercions
80
102
81
-
Now that we have the `script_size`, we know how many bytes to read, but this gets us into an interesting problem. When we create a buffer to read bytes into, we always have to provide a fixed size array. However, the `script_size` is dynamic and cannot be known at compile time. It can only be determined at runtime. You're not able to do something like this as the compiler will complain:
103
+
Now that we have the `script_size`, we know how many bytes to read, but this gets us into an interesting problem.
104
+
When we create a buffer to read bytes into, we always have to provide a fixed size array.
105
+
However, the `script_size` is dynamic and cannot be known at compile time.
106
+
It can only be determined at runtime.
107
+
You're not able to do something like this as the compiler will complain:
82
108
83
109
```rust
84
110
letmutbuffer= [0; script_size];
85
111
```
86
112
87
-
Let's look closer at the `read` method and what type of argument it accepts. If we look at the documentation, it accepts the argument of type `&mut [u8]`. https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read
88
-
89
-
This is interesting. Technically, it only accepts a mutable reference to a slice. But we've actually been passing in a mutable reference to an array! Remember an array is a fixed size of type `[u8; n]` and not a slice of type `[u8]`. So how has this been working at all? I thought we had to be explicit with types in Rust?
90
-
91
-
Well, under the hood, Rust is making an implicit conversion. It does this in a few different cases. In the case of an array, there is something known as an **Unsized Coercion**, in which it will automatically convert a sized type (such as an array, `[T; n]`) into an unsized type (a slice, `[T]`).
92
-
93
-
There is also something known as a **Deref Coercion**, which we can take advantage of here and which is something we alluded to in chapter 9. Basically, if a type implements the `Deref` trait, Rust will implicitly call the `deref` method on it until it gets the type that matches the argument's required type.
94
-
95
-
So going back to reading our script, what we want is a dynamically-sized buffer to read into. A vector would work just fine. But can we use it? Can we pass it into the `read` method as an argument? It turns out we can! In Rust, a Vec [implements the `DerefMut`](https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#2769) trait which dereferences to a slice. So we can initialize a `Vec` filled with 0s of the size of the script and then pass that into the `read` method as a mutable reference (`&mut Vec<u8>`). It will then be dereferenced to a slice and match the correct argument type, which is `&mut [u8]`.
113
+
Let's look closer at the `read` method and what type of argument it accepts.
114
+
If we look at the documentation, it accepts the argument of type `&mut [u8]`.
Technically, it only accepts a mutable reference to a slice.
119
+
But we've actually been passing in a mutable reference to an array!
120
+
Remember an array is a fixed size of type `[u8; n]` and not a slice of type `[u8]`.
121
+
So how has this been working at all?
122
+
I thought we had to be explicit with types in Rust?
123
+
124
+
Well, under the hood, Rust is making an implicit conversion.
125
+
It does this in a few different cases.
126
+
In the case of an array, there is something known as an **Unsized Coercion**, in which it will automatically convert a sized type (such as an array, `[T; n]`) into an unsized type (a slice, `[T]`).
127
+
128
+
There is also something known as a **Deref Coercion**, which we can take advantage of here and which is something we alluded to in chapter 9.
129
+
Basically, if a type implements the `Deref` trait, Rust will implicitly call the `deref` method on it until it gets the type that matches the argument's required type.
130
+
131
+
So going back to reading our script, what we want is a dynamically-sized buffer to read into.
132
+
A vector would work just fine.
133
+
But can we use it?
134
+
Can we pass it into the `read` method as an argument?
135
+
It turns out we can!
136
+
In Rust, a Vec [implements the `DerefMut`](https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#2769) trait which dereferences to a slice.
137
+
So we can initialize a `Vec` filled with 0s of the size of the script and then pass that into the `read` method as a mutable reference (`&mut Vec<u8>`).
138
+
It will then be dereferenced to a slice and match the correct argument type, which is `&mut [u8]`.
96
139
97
140
We'll create a new function called `read_script` which will return a `Vec<u8>`:
Lastly, we need to read the sequence, which are the last 4 bytes. A description of what the sequence number represents can be found in Mastering Bitcoin, Chapter 6.
151
+
Lastly, we need to read the the last 4 bytes for the sequence number.
152
+
A description of what the sequence number represents can be found in Mastering Bitcoin, Chapter 6.
109
153
110
154
```rust
111
155
...
@@ -118,7 +162,10 @@ Lastly, we need to read the sequence, which are the last 4 bytes. A description
118
162
...
119
163
```
120
164
121
-
Alright, now that we have each of the components of an input, what should we do with it? It makes sense to collect all this data together into one unified structure rather than just separate variables. The right type for this is Rust's `Struct` type, which we'll explore in the next lesson. Onwards!
165
+
Alright, now that we have each of the components of an input, what should we do with it?
166
+
It makes sense to collect all this data together into one unified structure rather than just separate variables.
167
+
The right type for this is Rust's `Struct` type, which we'll explore in the next lesson.
0 commit comments