Skip to content

Commit 69891ef

Browse files
committed
Add convnet slides
1 parent e750c42 commit 69891ef

File tree

4 files changed

+407
-29
lines changed

4 files changed

+407
-29
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ This repository provides concise and annotated examples for learning the basics
1010
- [Linear Regression](pytorch_tutorial/linear_regression/)
1111
- [Logistic Regression](pytorch_tutorial/logistic_regression/)
1212
- [MultiLayer Perceptron](pytorch_tutorial/multilayer_perceptron/)
13+
- [Convolutional Neural Network](pytorch_tutorial/convolutional_neural_network/)
1314
- ... (more to come)
1415

1516
## Usage

pytorch_tutorial/convolutional_neural_network/README.md

Lines changed: 357 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,360 @@ math: true # Use default Marp engine for math rendering
1818
This example trains a convolutional neural network to classify fashion items. The complete sourse code is available [here](test_convolutional_neural_network.py).
1919

2020
![Training outcome](images/convolutional_neural_network.png)
21+
22+
## Imports
23+
24+
```python
25+
import math
26+
import matplotlib.pyplot as plt
27+
import torch
28+
from torch import nn
29+
from torch.utils.data import DataLoader
30+
from torchvision import datasets, transforms
31+
```
32+
33+
## GPU support
34+
35+
> The `get_device()` utility function was defined in a [previous example](../fundamentals/README.md#gpu-support)
36+
37+
```python
38+
device = get_device()
39+
print(f"PyTorch {torch.__version__}, using {device} device")
40+
```
41+
42+
## Hyperparameters
43+
44+
```python
45+
# Hyperparameters
46+
n_epochs = 10 # Number of training iterations on the whole dataset
47+
learning_rate = 0.001 # Rate of parameter change during gradient descent
48+
batch_size = 64 # Number of samples used for one gradient descent step
49+
conv2d_kernel_size = 3 # Size of the 2D convolution kernels
50+
```
51+
52+
## Dataset loading
53+
54+
We use [Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist), a classic dataset for image recognition. Each example is a 28x28 grayscale image, associated with one label from 10 classes: t-shirt, trouser, pullover...
55+
56+
This dataset is provided py PyTorch through the [FashionMNIST](https://pytorch.org/vision/0.19/generated/torchvision.datasets.FashionMNIST.html) class. In order to evaluate the trained model performance on unseen data, this class splits the data into training and test sets.
57+
58+
Alongside download, a [transform](https://pytorch.org/vision/main/transforms.html) operation is applied to turn images into PyTorch tensors of shape `(color_depth, height, width)`, with pixel values scaled to the $[0,1]$ range.
59+
60+
### Dataset download
61+
62+
```python
63+
# Directory for downloaded files
64+
DATA_DIR = "./_output"
65+
66+
# Download and construct the Fashion-MNIST images dataset
67+
# The training set is used to train the model
68+
train_dataset = datasets.FashionMNIST(
69+
root=f"DATA_DIR",
70+
train=True, # Training set
71+
download=True,
72+
transform=transforms.ToTensor(),
73+
)
74+
# The test set is used to evaluate the trained model performance on unseen data
75+
test_dataset = datasets.FashionMNIST(
76+
root=f"DATA_DIR",
77+
train=False, # Test set
78+
download=True,
79+
transform=transforms.ToTensor(),
80+
)
81+
```
82+
83+
### Bacth loading: training set
84+
85+
```python
86+
# Create data loader for loading training data as randomized batches
87+
train_dataloader = DataLoader(
88+
dataset=train_dataset, batch_size=batch_size, shuffle=True
89+
)
90+
# Number of training samples
91+
n_train_samples = len(train_dataloader.dataset)
92+
# Number of batches in an epoch (= n_train_samples / batch_size, rounded up)
93+
n_batches = len(train_dataloader)
94+
assert n_batches == math.ceil(n_train_samples / batch_size)
95+
```
96+
97+
### Bacth loading: test set
98+
99+
```python
100+
# Create data loader for loading test data as randomized batches
101+
test_dataloader = DataLoader(
102+
dataset=test_dataset, batch_size=batch_size, shuffle=False
103+
)
104+
# Number of test samples
105+
n_test_samples = len(test_dataloader.dataset)
106+
107+
print(f"{n_train_samples} training samples, {n_test_samples} test samples")
108+
```
109+
110+
## Model definition
111+
112+
### PyTorch models as classes
113+
114+
Non-trivial PyTorch models are created as subclasses of the [Module]() class. Two elements must be included into a model class:
115+
116+
- the constructor (`__init__()` function) to define the model architecture;
117+
- the `forward()` function to implement the forward pass of input data through the model.
118+
119+
### Model architecture
120+
121+
We design a basic convolutional network. It takes a tensor of shape `(1, 28, 28)` (a rescaled grayscale image) as input and applies 2D convolution and max-pooling operations to detect interesting features. The output of these operations is flattened into a vector of shape and passes through two linear layers to compute 10 values, one for each possible class.
122+
123+
![Fashion-MNIST convet architecture](images/fashionnet.png)
124+
125+
### Model implementation
126+
127+
Our model implementation leverages the following PyTorch classes:
128+
129+
- [Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) to create a sequential container of operations.
130+
- [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) to apply a 2D convolution operation.
131+
- The [ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) activation function.
132+
- [MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) to apply max-pooling.
133+
- [Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) to flatten the extracted features into a vector.
134+
- [LazyLinear](https://pytorch.org/docs/stable/generated/torch.nn.LazyLinear.html), a fully connected layer whose input features are inferred during the first forward pass.
135+
- [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html), a fully connected layer used for final classification.
136+
137+
---
138+
139+
```python
140+
class Convnet(nn.Module):
141+
"""Convnet for fashion articles classification"""
142+
143+
def __init__(self, conv2d_kernel_size=3):
144+
super().__init__()
145+
146+
# Define a sequential stack of layers
147+
self.layer_stack = nn.Sequential(
148+
# 2D convolution, output shape: (batch_zize, out_channels, output_dim, output_dim)
149+
# Without padding, output_dim = (input_dim - kernel_size + 1) / stride
150+
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=conv2d_kernel_size),
151+
nn.ReLU(),
152+
# Max pooling, output shape: (batch_zize, out_channels, input_dim // kernel_size, input_dim // kernel_size)
153+
nn.MaxPool2d(kernel_size=2),
154+
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=conv2d_kernel_size),
155+
nn.ReLU(),
156+
nn.MaxPool2d(kernel_size=2),
157+
# Flattening layer, output shape: (batch_zize, out_channels * output_dim * output_dim)
158+
nn.Flatten(),
159+
# Linear layer whose input features are inferred during the first call to forward(). Output shape: (batch_zize, 128).
160+
# This avoids hardcoding the output shape of the previous layer, which depends on the shape of input images
161+
nn.LazyLinear(out_features=128),
162+
nn.ReLU(),
163+
# Output shape: (batch_size, 10)
164+
nn.Linear(in_features=128, out_features=10),
165+
)
166+
167+
def forward(self, x):
168+
"""Define the forward pass of the model"""
169+
170+
# Compute output of layer stack
171+
logits = self.layer_stack(x)
172+
173+
# Logits are a vector of raw (non-normalized) predictions
174+
# This vector contains 10 values, one for each possible class
175+
return logits
176+
```
177+
178+
### Model instantiation
179+
180+
```python
181+
# Create the convolutional network
182+
model = Convnet(conv2d_kernel_size=conv2d_kernel_size).to(device)
183+
184+
# Use the first training image as dummy to initialize the LazyLinear layer.
185+
# This is mandatory to count model parameters (see below)
186+
first_img, _ = train_dataset[0]
187+
# Add a dimension (to match expected shape with batch size) and store tensor on device memory
188+
dummy_batch = first_img[None, :].to(device)
189+
model(dummy_batch)
190+
191+
# Print model architecture
192+
print(model)
193+
```
194+
195+
### Parameter count
196+
197+
```python
198+
# Compute and print parameter count
199+
n_params = get_parameter_count(model)
200+
print(f"Model has {n_params} trainable parameters")
201+
202+
# Conv2d layers have (in_channels * kernel_size * kernel_size + 1) * out_channels parameters
203+
n_params_cond2d1 = (1 * conv2d_kernel_size * conv2d_kernel_size + 1) * 32
204+
n_params_cond2d2 = (32 * conv2d_kernel_size * conv2d_kernel_size + 1) * 64
205+
206+
# Max-pooling layers have zero parameters
207+
208+
# Linear layers have (in_features + 1) * out_features parameters.
209+
# To compute in_features for the first linear layer, we have to infer the output shapes of the previous layers.
210+
conv2d1_output_dim = 28 - conv2d_kernel_size + 1 # 2D cnvolution with no padding
211+
maxpool1_output_dim = conv2d1_output_dim // 2 # Max-pooling with a kernel of size 2
212+
conv2d2_output_dim = (
213+
maxpool1_output_dim - conv2d_kernel_size + 1
214+
) # 2D cnvolution with no padding
215+
maxpool2_output_dim = conv2d2_output_dim // 2 # 2D cnvolution with no padding
216+
# Output shape for the second max-pooling layer: (batch_size, 64, maxpool2_output_dim, maxpool2_output_dim)
217+
# Output shape for the flattening layer: (batch_size, 64 * maxpool2_output_dim * maxpool2_output_dim)
218+
n_params_linear1 = (64 * maxpool2_output_dim * maxpool2_output_dim + 1) * 128
219+
220+
n_params_linear2 = (128 + 1) * 10
221+
222+
assert (
223+
n_params
224+
== n_params_cond2d1 + n_params_cond2d2 + n_params_linear1 + n_params_linear2
225+
)
226+
```
227+
228+
## Loss function
229+
230+
For this multiclass classification task, we use the [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) class.
231+
232+
> As seen in a [previous example](../logistic_regression/README.md#loss-function), this class uses a softmax operation to output a probability distribution before computing the loss value.
233+
234+
```python
235+
# Use cross-entropy loss function.
236+
# nn.CrossEntropyLoss computes softmax internally
237+
criterion = nn.CrossEntropyLoss()
238+
```
239+
240+
## Gradient descent optimizer
241+
242+
We use the standard [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) optimizer, which improves the gradient descent algorithm through various optimizations ([more details](https://github.com/bpesquet/mlcourse/tree/main/lectures/gradient_descent#gradient-descent-optimization-algorithms)).
243+
244+
```python
245+
# Adam optimizer for gradient descent
246+
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
247+
```
248+
249+
## Training loop
250+
251+
```python
252+
# Set the model to training mode - important for batch normalization and dropout layers.
253+
# Unnecessary here but added for best practices
254+
model.train()
255+
256+
# Train the model
257+
for epoch in range(n_epochs):
258+
# Total loss for epoch, divided by number of batches to obtain mean loss
259+
epoch_loss = 0
260+
261+
# Number of correct predictions in an epoch, used to compute epoch accuracy
262+
n_correct = 0
263+
264+
for x_batch, y_batch in train_dataloader:
265+
# Copy batch data to GPU memory (if available)
266+
x_batch, y_batch = x_batch.to(device), y_batch.to(device)
267+
268+
# Forward pass
269+
y_pred = model(x_batch)
270+
271+
# Compute loss value
272+
loss = criterion(y_pred, y_batch)
273+
274+
# Gradient descent step
275+
optimizer.zero_grad()
276+
loss.backward()
277+
optimizer.step()
278+
279+
with torch.no_grad():
280+
# Accumulate data for epoch metrics: loss and number of correct predictions
281+
epoch_loss += loss.item()
282+
n_correct += (
283+
(model(x_batch).argmax(dim=1) == y_batch).float().sum().item()
284+
)
285+
286+
# Compute epoch metrics
287+
mean_loss = epoch_loss / n_batches
288+
epoch_acc = n_correct / n_train_samples
289+
290+
print(
291+
f"Epoch [{(epoch + 1):3}/{n_epochs:3}] finished. Mean loss: {mean_loss:.5f}. Accuracy: {epoch_acc * 100:.2f}%"
292+
)
293+
```
294+
295+
## Model evaluation on test data
296+
297+
Since we have a test set, we are able to assess the trained model performance on unseen data. This is important to detect the presence of overfitting.
298+
299+
```python
300+
# Set the model to evaluation mode - important for batch normalization and dropout layers.
301+
# Unnecessary here but added for best practices
302+
model.eval()
303+
304+
# Compute model accuracy on test data
305+
with torch.no_grad():
306+
n_correct = 0
307+
308+
for x_batch, y_batch in test_dataloader:
309+
# Copy batch data to GPU memory (if available)
310+
x_batch, y_batch = x_batch.to(device), y_batch.to(device)
311+
312+
y_pred = model(x_batch)
313+
n_correct += (model(x_batch).argmax(dim=1) == y_batch).float().sum().item()
314+
315+
test_acc = n_correct / len(test_dataloader.dataset)
316+
print(f"Test accuracy: {test_acc * 100:.2f}%")
317+
```
318+
319+
## Results plotting
320+
321+
Lastly, we may plot several test images and the associated class predictions.
322+
323+
> The `plot_fashion_images()` utility function is defined below.
324+
325+
```python
326+
# Plot several test images and their associated predictions
327+
_ = plot_fashion_images(data=test_dataset, device=device, model=model)
328+
plt.show()
329+
```
330+
331+
---
332+
333+
```python
334+
def plot_fashion_images(data, device, model=None):
335+
"""
336+
Plot some images with their associated or predicted labels
337+
"""
338+
339+
# Items, i.e. fashion categories associated to images and indexed by label
340+
fashion_items = (
341+
"T-Shirt",
342+
"Trouser",
343+
"Pullover",
344+
"Dress",
345+
"Coat",
346+
"Sandal",
347+
"Shirt",
348+
"Sneaker",
349+
"Bag",
350+
"Ankle Boot",
351+
)
352+
353+
figure = plt.figure()
354+
355+
cols, rows = 5, 3
356+
for i in range(1, cols * rows + 1):
357+
sample_idx = torch.randint(len(data), size=(1,)).item()
358+
img, label = data[sample_idx]
359+
figure.add_subplot(rows, cols, i)
360+
361+
# Title is the fashion item associated to either ground truth or predicted label
362+
if model is None:
363+
title = fashion_items[label]
364+
else:
365+
# Add a dimension (to match expected shape with batch size) and store image on device memory
366+
x_img = img[None, :].to(device)
367+
# Compute predicted label for image
368+
# Even if the model outputs unormalized logits, argmax gives us the predicted label
369+
pred_label = model(x_img).argmax(dim=1).item()
370+
title = f"{fashion_items[pred_label]}?"
371+
plt.title(title)
372+
373+
plt.axis("off")
374+
plt.imshow(img.cpu().detach().numpy().squeeze(), cmap="gray")
375+
376+
return plt.gcf()
377+
```
Loading

0 commit comments

Comments
 (0)