generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path07_modules.Rmd
134 lines (91 loc) · 3.28 KB
/
07_modules.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Modules
**Learning objectives:**
Learn about *modules* with focus on `nn_linear()`, `nn_squential()`, and `nn_module()`
## Built-in modules
**What are modules?**
: - an object that encapsulates state
- can be of any complexity (e.g. layer, or models consisting of layers)
Examples of `{torch}` modules:
- linear: `nn_linear()`
- convolutional: `nn_conf1d()`, `nn_conf2d()`, `nn_conv_3d()`
- recurrent: `nn_lstm()`, `nn_gru()`
- embedding: `nn_embedding()`
- multi-head attention: `nn_multihead_attention()`
- See [torch documentation](https://torch.mlverse.org/docs/reference/#neural-network-modules) for others
## Linear Layer: `nn_linear()`
Consider the [linear layer](https://torch.mlverse.org/docs/reference/nn_linear):
```{r}
library(torch)
l <- nn_linear(in_features = 5, out_features = 16) #bias = TRUE is default
l
```
Comment about size: We expect `l` to be $5 \times 16$ (i.e for matrix multiplication: $X_{50\times5}* \beta_{5 \times 16}$). We see below that it is $16 \times 5$, which is due to the underlying C++ implementation of `libtorch`. For performance reasons, the transpose is stored.
```{r}
l$weight$size()
```
Apply the module:
```{r}
#Generate data: generated from the normal distribution
x <- torch_randn(50, 5)
# Feed x into layer:
output <- l(x)
output$size()
```
When we use built-in modules, `requires_grad = TRUE` is [*not*]{.underline} required in creation of the tensor (unlike previous chapters). It's taken care of for us.
## Sequential Models: `nn_squential()`
[`nn_squential()`](https://torch.mlverse.org/docs/reference/nn_sequential) can be used for models that propagate straight through the layers. A Multi-Layer Perceptron (MLP) is an example (i.e. a network consisting only of linear layers). Below we build an MLP using this method:
```{r}
mlp <- nn_sequential( # all arguments should be modules
nn_linear(10, 32),
nn_relu(),
nn_linear(32,64),
nn_relu(),
nn_linear(64,1)
)
```
Apply this model to random data:
```{r}
output <- mlp(torch_randn(50, 10))
```
## General Models: `nn_module()`
[`nn_module()`](https://torch.mlverse.org/docs/reference/nn_module) is "factory function" for building models of arbitrary complexity. More flexible than the sequential model. Use to define:
- weight initialization
- model structure (forward pass), including identification of model parameters using `nn_parameter()` .
Example:
```{r}
my_linear <- nn_module(
initialize = function(in_features, out_features){
self$w <- nn_parameter(torch_randn(in_features, out_features)) # random normal
self$b <- nn_parameter(torch_zeros(out_features)) # zeros
},
forward = function(input){
input$mm(self$w) + self$b
}
)
```
Next instantiate the model with input and output dimensions:
```{r}
l <- my_linear(7, 1)
l
```
Apply the model to random data (just like we did in the previous section):
```{r}
output <- l(torch_randn(5, 7))
output
```
That was the forward pass. Let's define a (dummy) loss function and compute the gradient:
```{r}
loss <- output$mean()
loss$backward() # compute gradient
l$w$grad #inspect result
```
##
## Meeting Videos {.unnumbered}
### Cohort 1 {.unnumbered}
`r knitr::include_url("https://www.youtube.com/embed/URL")`
<details>
<summary>Meeting chat log</summary>
```
LOG
```
</details>