Skip to content

[BUG] TimeSeriesDataSet.to_dataloader batch_size, RuntimeError #1752

Open
@gbilleyPeco

Description

@gbilleyPeco

Describe the bug

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 44 for tensor number 54 in the list.

When executing Baseline().predict( dataloader, ... ), I get the following error. RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 44 for tensor number 54 in the list. I do not know if this is a bug, but I'm posting here at the direction of Franz Kiraly.

In the example below, if you set batch_size=4, the error disappears, but that number was found using trial and error. It would be nice to know which batch_sizes are valid without trial and error.

To Reproduce

import warnings

warnings.filterwarnings("ignore")

import lightning.pytorch as pl
from lightning.pytorch.callbacks import EarlyStopping
import matplotlib.pyplot as plt
import pandas as pd
import torch

from pytorch_forecasting import Baseline, DeepAR, TimeSeriesDataSet
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_forecasting.data.examples import generate_ar_data
from pytorch_forecasting.metrics import MAE, SMAPE, MultivariateNormalDistributionLoss

data = generate_ar_data(seasonality=12.0, timesteps=48, n_series=100, seed=42)
data["static"] = 2
data["date"] = pd.Timestamp("2020-01-01") + pd.to_timedelta(data.time_idx, "D")
data = data.astype(dict(series=str))

# create dataset and dataloaders
max_encoder_length = 24
max_prediction_length = 12

training_cutoff = data["time_idx"].max() - max_prediction_length

context_length = max_encoder_length
prediction_length = max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="value",
    categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
    group_ids=["series"],
    static_categoricals=[
        "series"
    ],  # as we plan to forecast correlations, it is important to use series characteristics (e.g. a series identifier)
    time_varying_unknown_reals=["value"],
    max_encoder_length=context_length,
    min_encoder_length=1,
    max_prediction_length=prediction_length,
    min_prediction_length=1,
)

validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff+1)
batch_size = 64
# synchronize samples in each batch over time - only necessary for DeepVAR, not for DeepAR
train_dataloader = training.to_dataloader(
    train=True, batch_size=batch_size, num_workers=0, batch_sampler="synchronized"
)
val_dataloader = validation.to_dataloader(
    train=False, batch_size=batch_size, num_workers=0, batch_sampler="synchronized"
)

baseline_predictions = Baseline().predict(val_dataloader, trainer_kwargs=dict(accelerator="cpu"), return_y=True)
SMAPE()(baseline_predictions.output, baseline_predictions.y)

Expected behavior

I expect the Baseline() object to make predictions about the data, and calculate the SMAPE.

Additional context

This code was taken directly from the DeepAR tutorial found here:, however I have changed a the generate_ar_data parameters, and set min_encoder_length=1 and min_prediction_length=1 when initializing the TimeSeriesDataSet.

Versions

I do not have sktime installed, but `pytorch-forecasting` is v1.2.0 and `pytorch-lightning` is v2.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Reproduced/confirmed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions