Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSE Calculation Discrepancy Between Scikit-Learn and MLForecast Cross Validation Method #471

Closed
capriceZ opened this issue Jan 23, 2025 · 2 comments
Labels

Comments

@capriceZ
Copy link

capriceZ commented Jan 23, 2025

What happened + What you expected to happen

Summary:

While performing single-step forecasting for a single time series data and evaluating model performance using cross-validation, I observed a mismatch in RMSE results between Scikit-Learn and MLForecast. Other metrics like MSE and MAE matched fine.

Steps to Reproduce:

  1. Scikit-Learn Approach:

    • Used TimeSeriesSplit and cross_val_score for cross-validation with a window size of 12 and test size of 1.
    • RMSE was calculated per fold, and the mean RMSE was derived manually.
  2. MLForecast Approach:

    • Used MLForecast.cross_validation and utilsforecast.evaluation.evaluate.
    • RMSE was calculated using all 12 months of predictions as a single dataset.

Issue:

Scikit-Learn calculates RMSE for each fold individually, then aggregates the results. In contrast, evaluate() in MLForecast combines predictions from all folds into a single dataset to calculate RMSE. This discrepancy leads to different RMSE values between the two frameworks.

Potential Impact:

This behavior might not be a bug, but it can cause confusion if users expect fold-wise RMSE calculations. Users of cross_validation() + evaluate() could be misled if this distinction isn’t explicitly documented.

Recommendation:

Explicitly document how utilsforecast.evaluation.evaluate calculates RMSE, especially when used with cross_validation(). This will help users interpret results correctly and avoid potential misalignment with other frameworks.

Versions / Dependencies

Python 3.10.15
scikit-learn 1.5.2
mlforecast 1.0.1
utilsforecast 0.2.11

Reproduction script

from sklearn.metrics import root_mean_squared_error, make_scorer
from sklearn.linear_model import Ridge
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from mlforecast import MLForecast
from utilsforecast.losses import rmse
from utilsforecast.evaluation import evaluate

Scikit-learn Code:

# prepare cv split
ts_cv = TimeSeriesSplit(
    n_splits=12,
    test_size=1,
)

# preprocessing
scale_features = make_column_transformer(
    (OneHotEncoder(drop='first'), ['month']), # month dummy coding
    remainder=MinMaxScaler() # feature scaling
)

# pipeline
ridge_pipe = Pipeline(
  [
    ('preprocess', scale_features),
    ('ridge_model', Ridge(random_state=2025))
  ]
)

my_rmse = make_scorer(root_mean_squared_error)
scores_by_fold = cross_val_score(ridge_pipe,  X_train, y_train, cv=ts_cv, scoring=my_rmse)
scores_by_fold.mean()

MLForecast Code:

# preprocessing
scale_features = make_column_transformer(
    (OneHotEncoder(drop='first'), ['month']), # month dummy coding
    remainder=MinMaxScaler() # feature scaling
)

# pipeline
ridge_pipe = Pipeline(
  [
    ('preprocess', scale_features),
    ('ridge_model', Ridge(random_state=2025))
  ]
)
mlf = MLForecast(
        models={'ridge': ridge_pipe},
        freq='MS'
    )

cv_df=mlf.cross_validation(
        df=train_df,
        h=1,
        n_windows=12,
        step_size=1,
    )

evaluate(
        cv_df.drop(columns='cutoff'),
        metrics=[rmse],
    )

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@capriceZ capriceZ added the bug label Jan 23, 2025
@jmoralez
Copy link
Member

Hey.

I don't think this is an issue with mlforecast, the cross validation method produces the predictions by fold, you then decide how you want to compute the metrics. For example here we compute the RMSE by fold, you can also compute it by fold-serie or just by serie (as you're doing in your example).

@capriceZ
Copy link
Author

Thank you for your helpful explanation! I was following the tutorial here. It will be nice if you can make this difference explicit in the tutorial. Thanks again for maintaining this amazing library!

@capriceZ capriceZ reopened this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants