RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

capriceZ · 2025-01-23T15:56:52Z

What happened + What you expected to happen

Summary:

While performing single-step forecasting for a single time series data and evaluating model performance using cross-validation, I observed a mismatch in RMSE results between Scikit-Learn and MLForecast. Other metrics like MSE and MAE matched fine.

Steps to Reproduce:

Scikit-Learn Approach:
- Used TimeSeriesSplit and cross_val_score for cross-validation with a window size of 12 and test size of 1.
- RMSE was calculated per fold, and the mean RMSE was derived manually.
MLForecast Approach:
- Used MLForecast.cross_validation and utilsforecast.evaluation.evaluate.
- RMSE was calculated using all 12 months of predictions as a single dataset.

Issue:

Scikit-Learn calculates RMSE for each fold individually, then aggregates the results. In contrast, evaluate() in MLForecast combines predictions from all folds into a single dataset to calculate RMSE. This discrepancy leads to different RMSE values between the two frameworks.

Potential Impact:

This behavior might not be a bug, but it can cause confusion if users expect fold-wise RMSE calculations. Users of cross_validation() + evaluate() could be misled if this distinction isn’t explicitly documented.

Recommendation:

Explicitly document how utilsforecast.evaluation.evaluate calculates RMSE, especially when used with cross_validation(). This will help users interpret results correctly and avoid potential misalignment with other frameworks.

Versions / Dependencies

Python 3.10.15
scikit-learn 1.5.2
mlforecast 1.0.1
utilsforecast 0.2.11

Reproduction script

from sklearn.metrics import root_mean_squared_error, make_scorer
from sklearn.linear_model import Ridge
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from mlforecast import MLForecast
from utilsforecast.losses import rmse
from utilsforecast.evaluation import evaluate

Scikit-learn Code:

# prepare cv split
ts_cv = TimeSeriesSplit(
    n_splits=12,
    test_size=1,
)

# preprocessing
scale_features = make_column_transformer(
    (OneHotEncoder(drop='first'), ['month']), # month dummy coding
    remainder=MinMaxScaler() # feature scaling
)

# pipeline
ridge_pipe = Pipeline(
  [
    ('preprocess', scale_features),
    ('ridge_model', Ridge(random_state=2025))
  ]
)

my_rmse = make_scorer(root_mean_squared_error)
scores_by_fold = cross_val_score(ridge_pipe,  X_train, y_train, cv=ts_cv, scoring=my_rmse)
scores_by_fold.mean()

MLForecast Code:

# preprocessing
scale_features = make_column_transformer(
    (OneHotEncoder(drop='first'), ['month']), # month dummy coding
    remainder=MinMaxScaler() # feature scaling
)

# pipeline
ridge_pipe = Pipeline(
  [
    ('preprocess', scale_features),
    ('ridge_model', Ridge(random_state=2025))
  ]
)
mlf = MLForecast(
        models={'ridge': ridge_pipe},
        freq='MS'
    )

cv_df=mlf.cross_validation(
        df=train_df,
        h=1,
        n_windows=12,
        step_size=1,
    )

evaluate(
        cv_df.drop(columns='cutoff'),
        metrics=[rmse],
    )

Issue Severity

Medium: It is a significant difficulty but I can work around it.

The text was updated successfully, but these errors were encountered:

jmoralez · 2025-01-23T16:46:16Z

Hey.

I don't think this is an issue with mlforecast, the cross validation method produces the predictions by fold, you then decide how you want to compute the metrics. For example here we compute the RMSE by fold, you can also compute it by fold-serie or just by serie (as you're doing in your example).

capriceZ · 2025-01-23T16:54:33Z

Thank you for your helpful explanation! I was following the tutorial here. It will be nice if you can make this difference explicit in the tutorial. Thanks again for maintaining this amazing library!

capriceZ added the bug label Jan 23, 2025

capriceZ closed this as completed Jan 23, 2025

capriceZ reopened this Jan 23, 2025

capriceZ closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

capriceZ commented Jan 23, 2025 •

edited

Loading

jmoralez commented Jan 23, 2025

capriceZ commented Jan 23, 2025

RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

Comments

capriceZ commented Jan 23, 2025 • edited Loading

What happened + What you expected to happen

Summary:

Steps to Reproduce:

Issue:

Potential Impact:

Recommendation:

Versions / Dependencies

Reproduction script

Scikit-learn Code:

MLForecast Code:

Issue Severity

jmoralez commented Jan 23, 2025

capriceZ commented Jan 23, 2025

capriceZ commented Jan 23, 2025 •

edited

Loading