Predictions looks not right after training the model #1181

s2005lg · 2025-02-20T03:17:56Z

s2005lg
Feb 20, 2025

My code is as follow:

import torch
import torch.nn as nn
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
import torch.nn.functional as F
import matplotlib.pyplot as plt

n_samples = 1000

X, y = make_circles(n_samples, noise=0.03, random_state=42, shuffle=True)
print(X.shape, y.shape)
print("X[:10] is: ", X[:10])
print("y[:10] is: ", y[:10])

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu)
plt.show()

X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42, shuffle=True)

# Building a model with non-linear
model_v2 = nn.Sequential(
    nn.Linear(in_features=2, out_features=10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1),
    nn.ReLU()
)

print(model_v2)

# Setup loss function
loss_fn = nn.BCEWithLogitsLoss()

# Setup optimizer
optimizer = torch.optim.SGD(params = model_v2.parameters(), lr=0.01)

epochs = 1000
torch.manual_seed(42)

for epoch in range(epochs):
    model_v2.train()

    # Forward pass
    y_pred = model_v2(X_train).squeeze()

    # Calculate loss
    loss = loss_fn(y_pred, y_train)

    # Backward pass
    loss.backward()

    # Optimizer step
    optimizer.step()

    # Testing
    model_v2.eval()
    with torch.inference_mode():
        test_pred = model_v2(X_test).squeeze()
        test_loss = loss_fn(test_pred, y_test)
    
    if epoch % 100 == 0:
        print(f"Epoch {epoch} | Loss: {loss:.5f} | Test loss: {test_loss:.5f}")
    model_v2.train()

model_v2.eval()
with torch.inference_mode():
    y_preds = torch.round(torch.sigmoid(model_v2(X_test))).squeeze()
print("y_preds is: ", y_preds[:10])
print("y_test is: ", y_test[:10])

My issue:
The y_preds are both 0 but y_test is normal.

Answered by Prezzo-K

Feb 20, 2025

The last Relu activation is messing up with your output. It is not common to use activation function as the last layer.

model_v2 = nn.Sequential(
    nn.Linear(in_features=2, out_features=10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1),
    nn.ReLU()  # DELETE THIS
)

Also add zero out your gradients in the training loop or they will accumulate

optimizer.zero_grad()

And for better result try out Adam optimizer with lr = 0.1

Hope this answers your questions.

View full answer

Prezzo-K · 2025-02-20T03:45:56Z

Prezzo-K
Feb 20, 2025

The last Relu activation is messing up with your output. It is not common to use activation function as the last layer.

model_v2 = nn.Sequential(
    nn.Linear(in_features=2, out_features=10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1),
    nn.ReLU()  # DELETE THIS
)

Also add zero out your gradients in the training loop or they will accumulate

optimizer.zero_grad()

And for better result try out Adam optimizer with lr = 0.1

Hope this answers your questions.

3 replies

s2005lg Feb 20, 2025
Author

Thanks so much for your answer. That's really helpful.

s2005lg Feb 20, 2025
Author

I have another question that, why on the prediction phase, I need to use
y_preds = torch.round(torch.sigmoid(model_v2(X_test))).squeeze()
instead of
y_preds = model_v2(X_test).squeeze() directly ?

Prezzo-K Feb 20, 2025

y_preds = torch.round(torch.sigmoid(model_v2(X_test))).squeeze()   -->Outputs binary labels (0,1)

y_preds = model_v2(X_test).squeeze()   --> Outputs raw logits. i.e [ [-2.34.2, 0.3.423, -1.3.3 ...] ...]. Not meaningful

In the prediction phase we are interested in getting the labels. Thus we move from logits -> Prediction Probability -> Prediction Labels.

We use torch.round(torch.sigmoid(model_v2(X_test))).squeeze() to convert the model's raw output (logits) into binary predictions (0 or 1).

The sigmoid function first takes in the raw output and turns them into probabilities. and the rounding function (torch.round) then converts those probabilities into binary values i.e anything with prob > 0.5 == 1 else 0. Using model_v2(X_test).squeeze() directly gives raw logits, which aren't suitable for comparison with binary labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Predictions looks not right after training the model #1181

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Predictions looks not right after training the model #1181

Uh oh!

s2005lg Feb 20, 2025

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

Prezzo-K Feb 20, 2025

Uh oh!

s2005lg Feb 20, 2025 Author

Uh oh!

s2005lg Feb 20, 2025 Author

Uh oh!

Uh oh!

Prezzo-K Feb 20, 2025

s2005lg
Feb 20, 2025

Replies: 1 comment 3 replies

Prezzo-K
Feb 20, 2025

s2005lg Feb 20, 2025
Author

s2005lg Feb 20, 2025
Author