1
1
---
2
2
title : " Introduction to Neural Networks with PyTorch"
3
- subtitle : " ICCS Summer School 2024 "
3
+ subtitle : " ICCS Summer School 2025 "
4
4
bibliography : references.bib
5
5
format :
6
6
revealjs :
@@ -22,9 +22,8 @@ authors:
22
22
- name : Matt Archer
23
23
affiliations : ICCS/Cambridge
24
24
orcid : 0009-0002-7043-6769
25
- - name : Surbhi Goel
25
+ - name : Isaac Akanho
26
26
affiliations : ICCS/Cambridge
27
- orcid : 0009-0005-0237-756X
28
27
29
28
revealjs-plugins :
30
29
- attribution
@@ -37,19 +36,18 @@ revealjs-plugins:
37
36
:::: {.columns}
38
37
::: {.column width=50%}
39
38
40
- * 9:00-9:30 - NN lecture
41
- * 9:30-10:30 - Teaching/Code-along
42
- * 10:30-11:00 - Coffee
43
- * 11:00-12 :00 - Teaching/Code-along
39
+ ### Wednesday
40
+ * 9:30-10:00 - NN lecture
41
+ * 10:00-10:30 - Teaching/Code-along
42
+ * 13:30-15 :00 - Teaching/Code-along
44
43
45
- Lunch
46
44
47
- * 12:00 - 13:30
45
+ ### Thursday
46
+
47
+ * 9:30-10:30 - Teaching/Code-along
48
48
49
49
::: {style="color: turquoise;"}
50
- Helping Today:
51
50
52
- * Person 1 - Cambridge RSE
53
51
:::
54
52
:::
55
53
::::
@@ -189,39 +187,33 @@ $$-\frac{dy}{dx}$$
189
187
- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
190
188
- We therefore need a way of measuring how well a model's predictions match our observations.
191
189
190
+ ## Fitting a straight line with SGD IV {.smaller}
192
191
193
- ::: {.fragment .fade-in}
194
192
195
- :::: {.columns}
196
- ::: {.column width="30%"}
193
+ ![ ] ( error-line.png )
194
+
195
+ - We can measure the distance between $f(x_ {i})$ and $y_ {i}$.
196
+
197
+
198
+ <!-- :::: {.columns} -->
199
+ <!-- ::: {.column width="30%"} -->
197
200
198
- - Consider the data:
201
+ <!-- - Consider the data:
199
202
200
203
| $x_{i}$ | $y_{i}$ |
201
204
|:--------:|:-------:|
202
205
| 1.0 | 2.1 |
203
206
| 2.0 | 3.9 |
204
- | 3.0 | 6.2 |
207
+ | 3.0 | 6.2 | -->
205
208
206
- :::
207
- ::: {.column width="70%"}
208
- - We can measure the distance between $f(x_ {i})$ and $y_ {i}$.
209
- - Normally we might consider the mean-squared error:
209
+ ## Fitting a straight line with SGD V {.smaller}
210
210
211
- $$ L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2} $$
212
211
213
- :::
214
- ::::
215
-
216
- :::
217
-
218
- ::: {.fragment .fade-in}
219
- - We can differentiate the loss function w.r.t. to each parameter in the the model $f$.
220
- - We can use these directions of steepest descent to iteratively 'nudge' the parameters in a direction which will reduce the loss.
221
- :::
212
+ <!-- ::: {.column width="70%"} -->
222
213
214
+ - Normally we might consider the mean-squared error:
223
215
224
- ## Fitting a straight line with SGD IV {.smaller}
216
+ $$ L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2} $$
225
217
226
218
:::: {.columns}
227
219
::: {.column width="45%"}
@@ -233,19 +225,43 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
233
225
- Loss: \ $\frac{1}{n}\sum_ {i=1}^{n}(y_ {i} - x_ {i})^{2}$
234
226
235
227
:::
236
- ::: {.column width="55%"}
237
228
229
+ ::: {.column width="55%"}
230
+
231
+ - We can differentiate the loss function w.r.t. to each parameter in the the model $f$.
238
232
$$
239
233
\begin{align}
240
234
L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\
241
235
&= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2}
242
236
\end{align}
243
237
$$
244
-
245
238
:::
246
239
::::
247
240
248
- ::: {.fragment .fade-in}
241
+
242
+ ####
243
+
244
+ ## Fitting a straight line with SGD VI {.smaller}
245
+
246
+ - Differential:
247
+
248
+ $$
249
+ \frac{\partial L}{\partial m}
250
+ \;=\;
251
+ \frac{1}{n}\sum_{i=1}^{n} 2\bigl(m\,x_{i}+c-y_{i}\bigr)\,x_{i}.
252
+ $$
253
+
254
+ $$
255
+ \frac{\partial L}{\partial c}
256
+ \;=\;
257
+ \frac{1}{n}\sum_{i=1}^{n} 2\bigl(m\,x_{i}+c-y_{i}\bigr).
258
+ $$
259
+
260
+ - This gradient is used to find the parameters that ** minimise the loss** , thereby reducing overall error.
261
+
262
+
263
+ ## Update Rule
264
+
249
265
- We can iteratively minimise the loss by stepping the model's parameters in the direction of steepest descent:
250
266
251
267
::: {layout="[ 0.5, 1, 0.5, 1, 0.5] "}
@@ -266,7 +282,6 @@ $$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$
266
282
:::
267
283
268
284
- where $l_ {\text{r}}$ is a small constant known as the _ learning rate_ .
269
- :::
270
285
271
286
272
287
## Quick recap {.smaller}
@@ -305,7 +320,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$
305
320
:::
306
321
::::
307
322
308
- ![ ] ( https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg ) {style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
323
+ ![ ] ( https://web.archive.org/web/20230105124836if_/https:// 3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg ) {style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
309
324
310
325
::: {.attribution}
311
326
Image source: [ 3Blue1Brown] ( https://www.3blue1brown.com/topics/neural-networks )
@@ -329,9 +344,178 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
329
344
330
345
- In this workshop, we will implement some straightforward neural networks in PyTorch, and use them for different classification and regression problems.
331
346
- PyTorch is a deep learning framework that can be used in both Python and C++.
332
- - I have never met anyone actually training models in C++; I find it a bit weird.
347
+ - There are other frameworks like Jax, Tensorflow, PyTorch Lightning
333
348
- See the PyTorch website: [ https://pytorch.org/ ] ( https://pytorch.org/ )
334
349
350
+ # Datasets, DataLoaders & ` nn.Module `
351
+
352
+
353
+ ---
354
+
355
+ ## What a ` Dataset ` class does
356
+
357
+ - Provides a ** uniform API** to your data
358
+ - Handles
359
+ - ** Loading** raw files (images, CSVs, audio …)
360
+ - ** Train / validation / test** split logic
361
+ - ** Transforms / augmentation** per item
362
+ - ** Item retrieval** so the rest of PyTorch can stay agnostic
363
+
364
+ ---
365
+
366
+ ## Anatomy of a custom ` Dataset `
367
+
368
+ ``` python
369
+ class MyDataset (torch .utils .data .Dataset ):
370
+ def __init__ (self , root_dir , split = " train" , transform = None ):
371
+ # 1️ load or download files / labels
372
+ self .paths, self .labels = load_index_file(root_dir, split)
373
+ self .transform = transform # 2️ save transforms
374
+ ```
375
+
376
+ * The constructor is where you gather file paths, download archives, read CSVs, etc.*
377
+
378
+ ---
379
+
380
+ ## ` __len__ ` & ` __getitem__ `
381
+
382
+ ``` python
383
+ def __len__ (self ):
384
+ return len (self .paths) # total #samples
385
+
386
+ def __getitem__ (self , idx ):
387
+ img = PIL .Image.open(self .paths[idx]).convert(" RGB" )
388
+ if self .transform: # 3️ apply transforms
389
+ img = self .transform(img)
390
+ label = self .labels[idx]
391
+ return img, label # 4️ single example
392
+ ```
393
+
394
+ With these two methods PyTorch knows ** how big** the dataset is and ** how to fetch** one record.
395
+
396
+ ---
397
+
398
+ ## Using the custom dataset
399
+
400
+ ``` python
401
+ from torchvision import transforms
402
+
403
+ train_ds = MyDataset(
404
+ " data/cats_vs_dogs" ,
405
+ split = " train" ,
406
+ transform = transforms.ToTensor()
407
+ )
408
+ print (len (train_ds)) # e.g. ➜ 20_000
409
+ img, y = train_ds[0 ] # one (tensor, label) pair
410
+ ```
411
+
412
+ ---
413
+
414
+ ## The ** DataLoader** at a glance
415
+
416
+ - Wraps any ` Dataset ` in an ** iterable**
417
+ - ** Batches** samples together
418
+ - ** Shuffles** if asked
419
+ - Uses ** multiprocessing** (` num_workers ` ) to pre‑fetch data in parallel
420
+ - Returns ` (batch, labels) ` tuples ready for the GPU
421
+
422
+ ---
423
+
424
+ ## Typical DataLoader code
425
+
426
+ ``` python
427
+ train_loader = torch.utils.data.DataLoader(
428
+ dataset = train_ds,
429
+ batch_size = 64 ,
430
+ shuffle = True ,
431
+ num_workers = 4 , # 4 CPU workers
432
+ )
433
+
434
+ for images, labels in train_loader:
435
+ ...
436
+ ```
437
+
438
+
439
+
440
+ ---
441
+
442
+ ## Quick networks with ` nn.Sequential `
443
+
444
+ ``` python
445
+ mlp = torch.nn.Sequential(
446
+ torch.nn.Linear(784 , 256 ), torch.nn.ReLU(),
447
+ torch.nn.Linear(256 , 64 ), torch.nn.ReLU(),
448
+ torch.nn.Linear(64 , 10 )
449
+ )
450
+
451
+ out = mlp(torch.rand(32 , 784 )) # 32‑sample batch
452
+ ```
453
+
454
+ Great for simple feed‑forward stacks when no branching logic is needed.
455
+
456
+ ---
457
+
458
+ ## ` nn.Module ` overview
459
+
460
+ - The ** base class** for * all* neural‑network parts in PyTorch
461
+ - You ** sub‑class** , then implement
462
+ - ` __init__(self) ` : declare layers
463
+ - ` forward(self, x) ` : define the forward pass
464
+
465
+ ---
466
+
467
+ ## Declaring layers in ` __init__ `
468
+
469
+ ``` python
470
+ class MyCNN (torch .nn .Module ):
471
+ def __init__ (self , num_classes = 2 ):
472
+ super ().__init__ ()
473
+ self .features = torch.nn.Sequential(
474
+ torch.nn.Conv2d(3 , 32 , 3 , padding = 1 ), torch.nn.ReLU(),
475
+ torch.nn.MaxPool2d(2 ),
476
+ torch.nn.Conv2d(32 , 64 , 3 , padding = 1 ), torch.nn.ReLU(),
477
+ torch.nn.MaxPool2d(2 )
478
+ )
479
+ self .classifier = torch.nn.Linear(64 * 56 * 56 , num_classes)
480
+ ```
481
+
482
+ ---
483
+
484
+ ## The ` forward ` pass
485
+
486
+ ``` python
487
+ def forward (self , x ):
488
+ x = self .features(x) # conv stack
489
+ x = x.flatten(1 ) # N,…
490
+ x = self .classifier(x) # logits
491
+ return x
492
+ ```
493
+
494
+ Only ** forward** is needed – back‑prop is handled automatically.
495
+
496
+ ---
497
+
498
+ ## Calling the model ≈ calling ` forward `
499
+
500
+ ``` python
501
+ model = MyCNN()
502
+ logits1 = model(images) # preferred ✔
503
+ logits2 = model.forward(images) # works, but avoid
504
+ ```
505
+
506
+ ` model(input) ` internally routes to ` model.forward(input) ` via ` __call__ ` .
507
+
508
+ ---
509
+
510
+ ## Key Take‑Aways
511
+
512
+ 1 . ** Dataset** = organized access to * individual* samples
513
+ 2 . ** DataLoader** = batching, shuffling, parallel I/O
514
+ 3 . ` nn.Module ` = reusable building block; override ` __init__ ` & ` forward `
515
+ 4 . ` model(x) ` is the idiomatic way to run a forward pass
516
+ 5 . Use ` nn.Sequential ` for quick layer chains
517
+
518
+
335
519
336
520
# Exercises
337
521
@@ -506,13 +690,13 @@ For more information we can be reached at:
506
690
507
691
::: {.column width="25%"}
508
692
509
- {{< fa pencil >}} \ Surbhi Goel
693
+ {{< fa pencil >}} \ Isaac Akanho
510
694
511
695
{{< fa solid person-digging >}} \ [ ICCS/UoCambridge] ( https://iccs.cam.ac.uk/about-us/our-team )
512
696
513
- {{< fa solid envelope >}} \ [ sg2147 [ AT] cam.ac.uk] ( mailto:sg2147 @cam.ac.uk )
697
+ {{< fa solid envelope >}} \ [ ia464 [ AT] cam.ac.uk] ( mailto:ia464 @cam.ac.uk )
514
698
515
- {{< fa brands github >}} \ [ surbhigoel77 ] ( https://github.com/surbhigoel77 )
699
+ {{< fa brands github >}} \ [ isaacaka ] ( https://github.com/isaacaka )
516
700
517
701
:::
518
702
0 commit comments