Skip to content

Commit e475745

Browse files
committed
add course13/course_en.md
1 parent 8da8be9 commit e475745

File tree

1 file changed

+313
-0
lines changed

1 file changed

+313
-0
lines changed

course13/course_en.md

Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
---
2+
marp: true
3+
math: mathjax
4+
paginate: true
5+
backgroundImage: url('../pics/background_moonbit.png')
6+
style: |
7+
.columns {
8+
display: grid;
9+
grid-template-columns: 2fr 1fr;
10+
gap: 1rem;
11+
}
12+
---
13+
14+
# Programming with MoonBit: A Modern Approach
15+
16+
## Case Study: Neural Network
17+
18+
### MoonBit Open Course Team
19+
20+
---
21+
22+
# Dataset: Iris
23+
24+
- The Iris dataset is the "Hello World" of machine learning
25+
- It was released in 1936
26+
- It has 3 classes with 50 samples each, representing different iris plant types
27+
- Each sample consists of 4 features:
28+
- sepal length, sepal width, petal length & petal width
29+
- Task
30+
- To build and train a neural network to classify the type of an iris plant based on its features, achieving an accuracy rate of over 95%
31+
32+
---
33+
34+
# Neural Networks
35+
36+
<div class="columns">
37+
<div>
38+
39+
- Neural networks are a subtype of machine learning
40+
- They simulate the neural structure of the human brain
41+
- A single neuron typically has
42+
- multiple inputs
43+
- one output
44+
- Neurons activate when they reach a certain threshold
45+
- A neural network is usually divided into multiple layers
46+
47+
</div>
48+
<div>
49+
50+
![height:350px](../pics/neural_network.drawio.svg)
51+
52+
</div>
53+
</div>
54+
55+
---
56+
57+
# The Structure of a Neural Network
58+
59+
<div class="columns">
60+
<div>
61+
62+
- A typical neural network consists of
63+
- Input layer: receives the inputs
64+
- Output layer: outputs the results
65+
- Hidden layers: layers between the input and output layers
66+
- The structure of a neural network includes
67+
- Number of hidden layers, neurons
68+
- How the layers/neurons are connected
69+
- Activation function of neurons
70+
- ...
71+
72+
</div>
73+
<div>
74+
75+
![height:350px](../pics/neural_network.drawio.svg)
76+
77+
</div>
78+
</div>
79+
80+
---
81+
82+
# A Sample Neural Network for Iris
83+
84+
<div class="columns">
85+
<div>
86+
87+
- Input: The value for each feature
88+
- Output: The likelihood of belonging to each type
89+
- Number of samples: 150
90+
- Network architecture: Feedforward neural network
91+
- Input layer: 4 nodes
92+
- Output layer: 3 nodes
93+
- Hidden layer: 1 layer with 4 nodes
94+
- Fully connected: Each neuron is connected to all neurons in the previous layer
95+
96+
</div>
97+
<div>
98+
99+
![height:350px](../pics/neural_network.drawio.svg)
100+
101+
</div>
102+
</div>
103+
104+
---
105+
106+
# Neurons
107+
108+
<div class="columns">
109+
<div>
110+
111+
- $f = w_0 x_0 + w_1 x_1 + \cdots + w_n x_n + c$
112+
- $w_i$, $c$: trainable parameters
113+
- $x_i$: inputs
114+
- Activation function
115+
- Hidden layer: Rectified Linear Unit (ReLU)
116+
- Neurons are not activated when the computed value is less than zero
117+
- $f(x) = \begin{cases}x & \text{if } x \ge 0 \\0 & \text{if } x < 0\end{cases}$
118+
- Output layer: Softmax
119+
- Organizes the outputs into a probability distribution with a total sum of 1
120+
- $f(x_m) = e^{x_m} / \sum_{i=1}^N e^{x_i}$
121+
122+
</div>
123+
<div>
124+
125+
![height:350px](../pics/neural_network.drawio.svg)
126+
127+
</div>
128+
</div>
129+
130+
---
131+
132+
# Implementation
133+
134+
- Basic operations
135+
```moonbit
136+
trait Base {
137+
constant(Double) -> Self
138+
value(Self) -> Double
139+
op_add(Self, Self) -> Self
140+
op_neg(Self) -> Self
141+
op_mul(Self, Self) -> Self
142+
op_div(Self, Self) -> Self
143+
exp(Self) -> Self // for computing softmax
144+
}
145+
```
146+
147+
---
148+
149+
# Implementation
150+
151+
- Activation function
152+
```moonbit
153+
fn reLU[T : Base](t : T) -> T {
154+
if t.value() < 0.0 { T::constant(0.0) } else { t }
155+
}
156+
157+
fn softmax[T : Base](inputs : Array[T]) -> Array[T] {
158+
let n = inputs.length()
159+
let outputs : Array[T] = Array::make(n, T::constant(0.0))
160+
let mut sum = T::constant(0.0)
161+
for i = 0; i < n; i = i + 1 {
162+
sum = sum + inputs[i].exp()
163+
}
164+
for i = 0; i < n; i = i + 1 {
165+
outputs[i] = inputs[i].exp() / sum
166+
}
167+
outputs
168+
}
169+
```
170+
171+
---
172+
173+
# Implementation
174+
175+
- Input layer -> Hidden layer
176+
```moonbit
177+
fn input2hidden[T : Base](inputs: Array[Double], param: Array[Array[T]]) -> Array[T] {
178+
let outputs : Array[T] = Array::make(param.length(), T::constant(0.0))
179+
for output = 0; output < param.length(); output = output + 1 { // 4 outputs
180+
for input = 0; input < inputs.length(); input = input + 1 { // 4 inputs
181+
outputs[output] = outputs[output] + T::constant(inputs[input]) * param[output][input]
182+
}
183+
outputs[output] = outputs[output] + param[output][inputs.length()] |> reLU // constant
184+
}
185+
outputs
186+
}
187+
```
188+
189+
---
190+
191+
# Implementation
192+
193+
- Hidden layer -> Output layer
194+
```moonbit
195+
fn hidden2output[T : Base](inputs: Array[T], param: Array[Array[T]]) -> Array[T] {
196+
let outputs : Array[T] = Array::make(param.length(), T::constant(0.0))
197+
for output = 0; output < param.length(); output = output + 1 { // 3 outputs
198+
for input = 0; input < inputs.length(); input = input + 1 { // 4 inputs
199+
outputs[output] = outputs[output] + inputs[input] * param[output][input]
200+
}
201+
outputs[output] = outputs[output] + param[output][inputs.length()] // constant
202+
}
203+
outputs |> softmax
204+
}
205+
```
206+
207+
---
208+
209+
# Training
210+
211+
- Cost function
212+
- Evaluates the "distance" between the current result and the expected result
213+
- Cross-entropy is a typical choice
214+
- Gradient descent
215+
- Gradient determines the direction of parameter adjustment
216+
- Learning rate
217+
- Learning rate determines the magnitude of parameter adjustment
218+
- We choose exponential decay
219+
220+
---
221+
222+
# Cost Function
223+
224+
- Multi-class cross-entropy: $I(x_j) = -\ln(p(x_j))$
225+
- $x_j$: event
226+
- $p(x_j)$: the probability of $x_j$ happening
227+
- Cost function:
228+
```moonbit
229+
trait Log {
230+
log(Self) -> Self // for computing cross-entropy
231+
}
232+
fn cross_entropy[T : Base + Log](inputs: Array[T], expected: Int) -> Double {
233+
-inputs[expected].log().value()
234+
}
235+
```
236+
237+
---
238+
239+
# Gradient Descent
240+
241+
- Backpropagation: Compute the gradients with backward differentiation and adjust the parameters accordingly
242+
- Accumulate the partial derivatives
243+
```moonbit
244+
fn Backward::param(param: Array[Array[Double]], diff: Array[Array[Double]],
245+
i: Int, j: Int) -> Backward {
246+
{ value: param[i][j], backward: fn { d => diff[i][j] = diff[i][j] + d} }
247+
}
248+
```
249+
- Compute the cost and perform backward differentiation accordingly
250+
```moonbit
251+
fn diff(inputs: Array[Double], expected: Int,
252+
param_hidden: Array[Array[Backward]], param_output: Array[Array[Backward]]) {
253+
let result = inputs
254+
|> input2hidden(param_hidden)
255+
|> hidden2output(param_output)
256+
|> cross_entropy(expected)
257+
result.backward(1.0)
258+
}
259+
```
260+
261+
---
262+
263+
# Gradient Descent
264+
265+
- Adjust parameters based on the gradients
266+
```moonbit
267+
fn update(params: Array[Array[Double]], diff: Array[Array[Double]], step: Double) {
268+
for i = 0; i < params.length(); i = i + 1 {
269+
for j = 0; j < params[i].length(); j = j + 1 {
270+
params[i][j] = params[i][j] - step * diff[i][j]
271+
}
272+
}
273+
}
274+
```
275+
276+
---
277+
278+
# Learning Rate
279+
280+
<div class="columns">
281+
<div>
282+
283+
- An inappropriate learning rate can cause worse performance, or even failure to converge to the optimal result
284+
- Exponential decay learning rate: $f(x) = a\mathrm{e}^{-bx}$, where $a$ and $b$ are constants and $x$ is the number of training epochs
285+
286+
</div>
287+
<div>
288+
289+
![height:400px](../pics/learning_rate.png)
290+
291+
</div>
292+
</div>
293+
294+
---
295+
296+
# Training Set vs Testing Set
297+
298+
- Randomly divide the dataset into two parts:
299+
- Training set: To train the parameters
300+
- Testing set: To evaluate how well a trained model performs on unseen data
301+
- If the amount of data is small, we typically perform full batch training
302+
- Each epoch consists of one iteration, in which all the training samples are used
303+
- If there is a large amount of data, we may perform mini batch training instead
304+
305+
---
306+
307+
# Summary
308+
309+
- This chapter introduces the basics of neural networks:
310+
- The structure of a neural network
311+
- The training process of a neural network
312+
- References:
313+
- [What is a neural network](https://www.ibm.com/topics/neural-networks)

0 commit comments

Comments
 (0)