Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TimeSeriesDataSet.to_dataloader batch_size, RuntimeError #1752

Open
gbilleyPeco opened this issue Jan 13, 2025 · 9 comments
Open

[BUG] TimeSeriesDataSet.to_dataloader batch_size, RuntimeError #1752

gbilleyPeco opened this issue Jan 13, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@gbilleyPeco
Copy link
Contributor

gbilleyPeco commented Jan 13, 2025

Describe the bug

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 44 for tensor number 54 in the list.

When executing Baseline().predict( dataloader, ... ), I get the following error. RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 44 for tensor number 54 in the list. I do not know if this is a bug, but I'm posting here at the direction of Franz Kiraly.

In the example below, if you set batch_size=4, the error disappears, but that number was found using trial and error. It would be nice to know which batch_sizes are valid without trial and error.

To Reproduce

import warnings

warnings.filterwarnings("ignore")

import lightning.pytorch as pl
from lightning.pytorch.callbacks import EarlyStopping
import matplotlib.pyplot as plt
import pandas as pd
import torch

from pytorch_forecasting import Baseline, DeepAR, TimeSeriesDataSet
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_forecasting.data.examples import generate_ar_data
from pytorch_forecasting.metrics import MAE, SMAPE, MultivariateNormalDistributionLoss

data = generate_ar_data(seasonality=12.0, timesteps=48, n_series=100, seed=42)
data["static"] = 2
data["date"] = pd.Timestamp("2020-01-01") + pd.to_timedelta(data.time_idx, "D")
data = data.astype(dict(series=str))

# create dataset and dataloaders
max_encoder_length = 24
max_prediction_length = 12

training_cutoff = data["time_idx"].max() - max_prediction_length

context_length = max_encoder_length
prediction_length = max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="value",
    categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
    group_ids=["series"],
    static_categoricals=[
        "series"
    ],  # as we plan to forecast correlations, it is important to use series characteristics (e.g. a series identifier)
    time_varying_unknown_reals=["value"],
    max_encoder_length=context_length,
    min_encoder_length=1,
    max_prediction_length=prediction_length,
    min_prediction_length=1,
)

validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff+1)
batch_size = 64
# synchronize samples in each batch over time - only necessary for DeepVAR, not for DeepAR
train_dataloader = training.to_dataloader(
    train=True, batch_size=batch_size, num_workers=0, batch_sampler="synchronized"
)
val_dataloader = validation.to_dataloader(
    train=False, batch_size=batch_size, num_workers=0, batch_sampler="synchronized"
)

baseline_predictions = Baseline().predict(val_dataloader, trainer_kwargs=dict(accelerator="cpu"), return_y=True)
SMAPE()(baseline_predictions.output, baseline_predictions.y)

Expected behavior

I expect the Baseline() object to make predictions about the data, and calculate the SMAPE.

Additional context

This code was taken directly from the DeepAR tutorial found here:, however I have changed a the generate_ar_data parameters, and set min_encoder_length=1 and min_prediction_length=1 when initializing the TimeSeriesDataSet.

Versions

I do not have sktime installed, but `pytorch-forecasting` is v1.2.0 and `pytorch-lightning` is v2.4.0
@gbilleyPeco gbilleyPeco added the bug Something isn't working label Jan 13, 2025
@github-project-automation github-project-automation bot moved this to Needs triage & validation in Bugfixing - pytorch-forecasting Jan 13, 2025
@fnhirwa
Copy link
Member

fnhirwa commented Jan 14, 2025

Reproduced the bug on main branch.

@fnhirwa fnhirwa moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing - pytorch-forecasting Jan 14, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 28, 2025

I cannot reproduce the bug, but I do get another exception (current main, windows, python 3.11, minimal depset of ptf)

I get RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 44 for tensor number 54 in the list.

@gbilleyPeco
Copy link
Contributor Author

Just wanted to see if any progress has been made on this yet? If there is any way I can help I'd be happy to, please let me know if you would like further information on my use case. @fnhirwa @fkiraly

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 20, 2025

@gbilleyPeco, could you kindly confirm what error you get on the code that is currently the MRE? That is, the code that is currently at the top of this issue. Above, we were unable to verify the reported initial exception.

@gbilleyPeco
Copy link
Contributor Author

@fkiraly The code that causes the error is this: baseline_predictions = Baseline().predict(val_dataloader, trainer_kwargs=dict(accelerator="cpu"), return_y=True). Does that answer your question?

@gbilleyPeco
Copy link
Contributor Author

Looking into this more, there are many posts on Stack Overflow and other sources where people have the same error. Posting a few examples below as info.

For example:
https://stackoverflow.com/questions/77723713/runtimeerror-sizes-of-tensors-must-match-except-in-dimension-1-expected-size-1
https://discuss.pytorch.org/t/runtimeerror-sizes-of-tensors-must-match-except-in-dimension-1/140651
https://www.reddit.com/r/StableDiffusion/comments/y6izxb/automatic_1111_runtimeerror_sizes_of_tensors_must/
http://github.com/CompVis/stable-diffusion/issues/301

The responses on these posts suggest this happens when there is a mismatch between the shape of the dataset and the model architecture.

@RUPESH-KUMAR01
Copy link

Summary of the debugging

First I took the tutorial code in that I tried printing the size of validation(TimeSeriesDataSet) it turned out to be 100. Since in the tutorial batch size was taken as 128 this error didn't show up. When I changed batch_size=64,The error showed up. changing concat_sequences to _torch_cat_na which concat based on batch dimension solved the issue but when I did PR I got too many errors .In the error being shown the y that is being concatenated is the target from the TimeSeriesDataset the target(which is y) will be of shape (batch_size,time_steps) here the time_steps will be equal to prediction_length. The concat function will concat the y based on the time_steps,but they should be concat along the batch_size cause the ouput of the model is being concatenated along the batch.

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 24, 2025

Interesting, and thanks for helping debug - does this suggest any fixes, @RUPESH-KUMAR01?

@RUPESH-KUMAR01
Copy link

RUPESH-KUMAR01 commented Feb 24, 2025

From my understanding, the recent PR #1783 I closed solves this issue for this specific case, but it is causing problems with other models and failing the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Reproduced/confirmed
Development

No branches or pull requests

4 participants