Skip to content

Commit aedb47a

Browse files
isaacakama595
authored andcommitted
Update slides (#100)
* Add error line graph * Update slides Adds overview of PyTorch concepts Includes mention of other frameworks Adds example derivative of mse loss Adds diagram of scatter plot with error lines
1 parent ac68937 commit aedb47a

File tree

2 files changed

+224
-40
lines changed

2 files changed

+224
-40
lines changed

slides/error-line.png

29.8 KB
Loading

slides/slides.qmd

Lines changed: 224 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Introduction to Neural Networks with PyTorch"
3-
subtitle: "ICCS Summer School 2024"
3+
subtitle: "ICCS Summer School 2025"
44
bibliography: references.bib
55
format:
66
revealjs:
@@ -22,9 +22,8 @@ authors:
2222
- name: Matt Archer
2323
affiliations: ICCS/Cambridge
2424
orcid: 0009-0002-7043-6769
25-
- name: Surbhi Goel
25+
- name: Isaac Akanho
2626
affiliations: ICCS/Cambridge
27-
orcid: 0009-0005-0237-756X
2827

2928
revealjs-plugins:
3029
- attribution
@@ -37,19 +36,18 @@ revealjs-plugins:
3736
:::: {.columns}
3837
::: {.column width=50%}
3938

40-
* 9:00-9:30 - NN lecture
41-
* 9:30-10:30 - Teaching/Code-along
42-
* 10:30-11:00 - Coffee
43-
* 11:00-12:00 - Teaching/Code-along
39+
### Wednesday
40+
* 9:30-10:00 - NN lecture
41+
* 10:00-10:30 - Teaching/Code-along
42+
* 13:30-15:00 - Teaching/Code-along
4443

45-
Lunch
4644

47-
* 12:00 - 13:30
45+
### Thursday
46+
47+
* 9:30-10:30 - Teaching/Code-along
4848

4949
::: {style="color: turquoise;"}
50-
Helping Today:
5150

52-
* Person 1 - Cambridge RSE
5351
:::
5452
:::
5553
::::
@@ -189,39 +187,33 @@ $$-\frac{dy}{dx}$$
189187
- When fitting a function, we are essentially creating a model, $f$, which describes some data, $y$.
190188
- We therefore need a way of measuring how well a model's predictions match our observations.
191189

190+
## Fitting a straight line with SGD IV {.smaller}
192191

193-
::: {.fragment .fade-in}
194192

195-
:::: {.columns}
196-
::: {.column width="30%"}
193+
![](error-line.png)
194+
195+
- We can measure the distance between $f(x_{i})$ and $y_{i}$.
196+
197+
198+
<!-- :::: {.columns} -->
199+
<!-- ::: {.column width="30%"} -->
197200

198-
- Consider the data:
201+
<!-- - Consider the data:
199202
200203
| $x_{i}$ | $y_{i}$ |
201204
|:--------:|:-------:|
202205
| 1.0 | 2.1 |
203206
| 2.0 | 3.9 |
204-
| 3.0 | 6.2 |
207+
| 3.0 | 6.2 | -->
205208

206-
:::
207-
::: {.column width="70%"}
208-
- We can measure the distance between $f(x_{i})$ and $y_{i}$.
209-
- Normally we might consider the mean-squared error:
209+
## Fitting a straight line with SGD V {.smaller}
210210

211-
$$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
212211

213-
:::
214-
::::
215-
216-
:::
217-
218-
::: {.fragment .fade-in}
219-
- We can differentiate the loss function w.r.t. to each parameter in the the model $f$.
220-
- We can use these directions of steepest descent to iteratively 'nudge' the parameters in a direction which will reduce the loss.
221-
:::
212+
<!-- ::: {.column width="70%"} -->
222213

214+
- Normally we might consider the mean-squared error:
223215

224-
## Fitting a straight line with SGD IV {.smaller}
216+
$$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
225217

226218
:::: {.columns}
227219
::: {.column width="45%"}
@@ -233,19 +225,43 @@ $$L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}$$
233225
- Loss: \ $\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}$
234226

235227
:::
236-
::: {.column width="55%"}
237228

229+
::: {.column width="55%"}
230+
231+
- We can differentiate the loss function w.r.t. to each parameter in the the model $f$.
238232
$$
239233
\begin{align}
240234
L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\
241235
&= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2}
242236
\end{align}
243237
$$
244-
245238
:::
246239
::::
247240

248-
::: {.fragment .fade-in}
241+
242+
####
243+
244+
## Fitting a straight line with SGD VI {.smaller}
245+
246+
- Differential:
247+
248+
$$
249+
\frac{\partial L}{\partial m}
250+
\;=\;
251+
\frac{1}{n}\sum_{i=1}^{n} 2\bigl(m\,x_{i}+c-y_{i}\bigr)\,x_{i}.
252+
$$
253+
254+
$$
255+
\frac{\partial L}{\partial c}
256+
\;=\;
257+
\frac{1}{n}\sum_{i=1}^{n} 2\bigl(m\,x_{i}+c-y_{i}\bigr).
258+
$$
259+
260+
- This gradient is used to find the parameters that **minimise the loss**, thereby reducing overall error.
261+
262+
263+
## Update Rule
264+
249265
- We can iteratively minimise the loss by stepping the model's parameters in the direction of steepest descent:
250266

251267
::: {layout="[0.5, 1, 0.5, 1, 0.5]"}
@@ -266,7 +282,6 @@ $$c_{n + 1} = c_{n} - \frac{dL}{dc} \cdot l_{r}$$
266282
:::
267283

268284
- where $l_{\text{r}}$ is a small constant known as the _learning rate_.
269-
:::
270285

271286

272287
## Quick recap {.smaller}
@@ -305,7 +320,7 @@ $$a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)$$
305320
:::
306321
::::
307322

308-
![](https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
323+
![](https://web.archive.org/web/20230105124836if_/https://3b1b-posts.us-east-1.linodeobjects.com//images/topics/neural-networks.jpg){style="border-radius: 50%;" .absolute top=35% left=42.5% width=65%}
309324

310325
::: {.attribution}
311326
Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
@@ -329,9 +344,178 @@ Image source: [3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks)
329344

330345
- In this workshop, we will implement some straightforward neural networks in PyTorch, and use them for different classification and regression problems.
331346
- PyTorch is a deep learning framework that can be used in both Python and C++.
332-
- I have never met anyone actually training models in C++; I find it a bit weird.
347+
- There are other frameworks like Jax, Tensorflow, PyTorch Lightning
333348
- See the PyTorch website: [https://pytorch.org/](https://pytorch.org/)
334349

350+
# Datasets, DataLoaders & `nn.Module`
351+
352+
353+
---
354+
355+
## What a `Dataset` class does
356+
357+
- Provides a **uniform API** to your data
358+
- Handles
359+
- **Loading** raw files (images, CSVs, audio …)
360+
- **Train / validation / test** split logic
361+
- **Transforms / augmentation** per item
362+
- **Item retrieval** so the rest of PyTorch can stay agnostic
363+
364+
---
365+
366+
## Anatomy of a custom `Dataset`
367+
368+
```python
369+
class MyDataset(torch.utils.data.Dataset):
370+
def __init__(self, root_dir, split="train", transform=None):
371+
# 1️ load or download files / labels
372+
self.paths, self.labels = load_index_file(root_dir, split)
373+
self.transform = transform # 2️ save transforms
374+
```
375+
376+
*The constructor is where you gather file paths, download archives, read CSVs, etc.*
377+
378+
---
379+
380+
## `__len__` & `__getitem__`
381+
382+
```python
383+
def __len__(self):
384+
return len(self.paths) # total #samples
385+
386+
def __getitem__(self, idx):
387+
img = PIL.Image.open(self.paths[idx]).convert("RGB")
388+
if self.transform: # 3️ apply transforms
389+
img = self.transform(img)
390+
label = self.labels[idx]
391+
return img, label # 4️ single example
392+
```
393+
394+
With these two methods PyTorch knows **how big** the dataset is and **how to fetch** one record.
395+
396+
---
397+
398+
## Using the custom dataset
399+
400+
```python
401+
from torchvision import transforms
402+
403+
train_ds = MyDataset(
404+
"data/cats_vs_dogs",
405+
split="train",
406+
transform=transforms.ToTensor()
407+
)
408+
print(len(train_ds)) # e.g. ➜ 20_000
409+
img, y = train_ds[0] # one (tensor, label) pair
410+
```
411+
412+
---
413+
414+
## The **DataLoader** at a glance
415+
416+
- Wraps any `Dataset` in an **iterable**
417+
- **Batches** samples together
418+
- **Shuffles** if asked
419+
- Uses **multiprocessing** (`num_workers`) to pre‑fetch data in parallel
420+
- Returns `(batch, labels)` tuples ready for the GPU
421+
422+
---
423+
424+
## Typical DataLoader code
425+
426+
```python
427+
train_loader = torch.utils.data.DataLoader(
428+
dataset=train_ds,
429+
batch_size=64,
430+
shuffle=True,
431+
num_workers=4, # 4 CPU workers
432+
)
433+
434+
for images, labels in train_loader:
435+
...
436+
```
437+
438+
439+
440+
---
441+
442+
## Quick networks with `nn.Sequential`
443+
444+
```python
445+
mlp = torch.nn.Sequential(
446+
torch.nn.Linear(784, 256), torch.nn.ReLU(),
447+
torch.nn.Linear(256, 64), torch.nn.ReLU(),
448+
torch.nn.Linear(64, 10)
449+
)
450+
451+
out = mlp(torch.rand(32, 784)) # 32‑sample batch
452+
```
453+
454+
Great for simple feed‑forward stacks when no branching logic is needed.
455+
456+
---
457+
458+
## `nn.Module` overview
459+
460+
- The **base class** for *all* neural‑network parts in PyTorch
461+
- You **sub‑class**, then implement
462+
- `__init__(self)`: declare layers
463+
- `forward(self, x)`: define the forward pass
464+
465+
---
466+
467+
## Declaring layers in `__init__`
468+
469+
```python
470+
class MyCNN(torch.nn.Module):
471+
def __init__(self, num_classes=2):
472+
super().__init__()
473+
self.features = torch.nn.Sequential(
474+
torch.nn.Conv2d(3, 32, 3, padding=1), torch.nn.ReLU(),
475+
torch.nn.MaxPool2d(2),
476+
torch.nn.Conv2d(32, 64, 3, padding=1), torch.nn.ReLU(),
477+
torch.nn.MaxPool2d(2)
478+
)
479+
self.classifier = torch.nn.Linear(64*56*56, num_classes)
480+
```
481+
482+
---
483+
484+
## The `forward` pass
485+
486+
```python
487+
def forward(self, x):
488+
x = self.features(x) # conv stack
489+
x = x.flatten(1) # N,…
490+
x = self.classifier(x) # logits
491+
return x
492+
```
493+
494+
Only **forward** is needed – back‑prop is handled automatically.
495+
496+
---
497+
498+
## Calling the model ≈ calling `forward`
499+
500+
```python
501+
model = MyCNN()
502+
logits1 = model(images) # preferred ✔
503+
logits2 = model.forward(images) # works, but avoid
504+
```
505+
506+
`model(input)` internally routes to `model.forward(input)` via `__call__`.
507+
508+
---
509+
510+
## Key Take‑Aways
511+
512+
1. **Dataset** = organized access to *individual* samples
513+
2. **DataLoader** = batching, shuffling, parallel I/O
514+
3. `nn.Module` = reusable building block; override `__init__` & `forward`
515+
4. `model(x)` is the idiomatic way to run a forward pass
516+
5. Use `nn.Sequential` for quick layer chains
517+
518+
335519

336520
# Exercises
337521

@@ -506,13 +690,13 @@ For more information we can be reached at:
506690

507691
::: {.column width="25%"}
508692

509-
{{< fa pencil >}} \ Surbhi Goel
693+
{{< fa pencil >}} \ Isaac Akanho
510694

511695
{{< fa solid person-digging >}} \ [ICCS/UoCambridge](https://iccs.cam.ac.uk/about-us/our-team)
512696

513-
{{< fa solid envelope >}} \ [sg2147[AT]cam.ac.uk](mailto:sg2147@cam.ac.uk)
697+
{{< fa solid envelope >}} \ [ia464[AT]cam.ac.uk](mailto:ia464@cam.ac.uk)
514698

515-
{{< fa brands github >}} \ [surbhigoel77](https://github.com/surbhigoel77)
699+
{{< fa brands github >}} \ [isaacaka](https://github.com/isaacaka)
516700

517701
:::
518702

0 commit comments

Comments
 (0)