marp | math |
---|---|
true |
true |
This example trains a Linear Regression model on a minimalist 2D dataset. The complete sourse code is available here.
First of all, we need to import the necessary stuff.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
from torch import nn
Let's probe for the availability of an accelerated device.
Note
The get_device()
utility function was defined in a previous example.
device = get_device()
print(f"PyTorch {torch.__version__}, using {device} device")
Next, we define the various hyperparameters for this example.
# Hyperparameters
n_epochs = 60 # Number of training iterations on the whole dataset
learning_rate = 0.001 # Rate of parameter change during gradient descent
To keep things as simple as possible, the dataset is created from scratch as two NumPy arrays: ìnputs
(x-coordinates of the samples) and targets
(corresponding y-coordinates of the samples).
# Toy dataset: inputs and expected results
inputs = np.array(
[
[3.3], [4.4], [5.5], [6.71], [6.93], [4.168], [9.779], [6.182],
[7.59], [2.167], [7.042], [10.791], [5.313], [7.997], [3.1],
],
dtype=np.float32,
)
targets = np.array(
[
[1.7], [2.76], [2.09], [3.19], [1.694], [1.573], [3.366], [2.596],
[2.53], [1.221], [2.827], [3.465], [1.65], [2.904], [1.3],
],
dtype=np.float32,
)
print(f"Inputs: {inputs.shape}. targets: {targets.shape}")
Both inputs and targets are subsequently converted to PyTorch tensors stored into the device memory.
# Convert dataset to PyTorch tensors and put them on GPU memory (if available)
x_train = torch.from_numpy(inputs).to(device)
y_train = torch.from_numpy(targets).to(device)
The Linear Regression model is implemented with the PyTorch Linear class, which applies an affine tranformation to its input.
This model has one input (the x-coordinate of a sample) and one output (its y-coordinate).
# Create a Linear Regression model and put it on GPU memory
model = nn.Linear(in_features=1, out_features=1).to(device)
The model defines a function
We define a function to count model parameters that will be reused in other examples.
def get_parameter_count(model):
"""Return the number of trainable parameters for a PyTorch model"""
return sum(p.numel() for p in model.parameters() if p.requires_grad)
# Print model architecture
print(model)
# Compute and print parameter count
n_params = get_parameter_count(model)
print(f"Model has {n_params} trainable parameters")
# Linear layers have (in_features + 1) * out_features parameters
assert n_params == 2
The MSELoss class implements the Mean Squared Error loss function, well suited to regression tasks.
# Use Mean Squared Error loss
criterion = nn.MSELoss()
The loop for training a PyTorch model in a supervised way is always composed of four main parts:
- compute the model outputs for a set of inputs;
- compute the value of the loss function (difference between expected and actual values);
- use autodiff to obtain the gradients of the loss functions w.r.t each model parameter;
- update each parameter in the opposite direction of its gradient.
In this first example, the training loop is implemented in the simplest way possible.
- No batching: due to the small sample count, the whole dataset is used at each epoch (training iteration).
- Model parameters are updated manually rather than by using a pre-built optimizer. This choice is made to better illustrate the gradient descent algorithm.
Subsequent examples will use more standard techniques.
# Set the model to training mode - important for batch normalization and dropout layers.
# Unnecessary here but added for best practices
model.train()
# Train the model
for epoch in range(n_epochs):
# Forward pass
y_pred = model(x_train)
# Compute loss value
loss = criterion(y_pred, y_train)
# Reset the gradients to zero before running the backward pass.
# Avoids accumulating gradients between GD steps
model.zero_grad()
# Compute gradients
loss.backward()
# no_grad() avoids tracking operations history when gradients computation is not needed
with torch.no_grad():
# Manual gradient descent step: update the weights in the opposite direction of their gradient
for param in model.parameters():
param -= learning_rate * param.grad
# Print training progression
if (epoch + 1) % 5 == 0:
print(
f"Epoch [{(epoch + 1):3}/{n_epochs:3}] finished. Loss: {loss.item():.5f}"
)
Finally, model predictions (fitted line) are plotted alongside training data.
Note
The plot_training_result()
utility function is defined below.
# Improve plots appearance
sns.set_theme()
_ = plot_training_results(
model=model, x=x_train, y=y_train, title="Linear Regression with PyTorch"
)
plt.show()
def plot_training_results(model, x, y, title):
"""
Plot data and model predictions.
Args:
model (torch.nn.Module): Trained PyTorch model
x (torch.Tensor): Input features of shape (n_samples, 2)
y (torch.Tensor): Labels of shape (n_samples,)
title (str): Plot title
"""
# Set the model to evaluation mode - important for batch normalization and dropout layers.
# Unnecessary here but added for best practices
model.eval()
# Compute model results on training data, and convert them to a NumPy array
y_pred = model(x).detach().cpu().numpy()
# Convert inputs and targets to NumPy arrays
x_cpu = x.detach().cpu().numpy()
y_cpu = y.detach().cpu().numpy()
# Plot the training results
plt.plot(x_cpu, y_cpu, "ro", label="Original data")
plt.plot(x_cpu, y_pred, label="Fitted line")
plt.legend()
plt.title(title)
return plt.gcf()