You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While performing single-step forecasting for a single time series data and evaluating model performance using cross-validation, I observed a mismatch in RMSE results between Scikit-Learn and MLForecast. Other metrics like MSE and MAE matched fine.
Steps to Reproduce:
Scikit-Learn Approach:
Used TimeSeriesSplit and cross_val_score for cross-validation with a window size of 12 and test size of 1.
RMSE was calculated per fold, and the mean RMSE was derived manually.
MLForecast Approach:
Used MLForecast.cross_validation and utilsforecast.evaluation.evaluate.
RMSE was calculated using all 12 months of predictions as a single dataset.
Issue:
Scikit-Learn calculates RMSE for each fold individually, then aggregates the results. In contrast, evaluate() in MLForecast combines predictions from all folds into a single dataset to calculate RMSE. This discrepancy leads to different RMSE values between the two frameworks.
Potential Impact:
This behavior might not be a bug, but it can cause confusion if users expect fold-wise RMSE calculations. Users of cross_validation() + evaluate() could be misled if this distinction isn’t explicitly documented.
Recommendation:
Explicitly document how utilsforecast.evaluation.evaluate calculates RMSE, especially when used with cross_validation(). This will help users interpret results correctly and avoid potential misalignment with other frameworks.
from sklearn.metrics import root_mean_squared_error, make_scorer
from sklearn.linear_model import Ridge
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from mlforecast import MLForecast
from utilsforecast.losses import rmse
from utilsforecast.evaluation import evaluate
I don't think this is an issue with mlforecast, the cross validation method produces the predictions by fold, you then decide how you want to compute the metrics. For example here we compute the RMSE by fold, you can also compute it by fold-serie or just by serie (as you're doing in your example).
Thank you for your helpful explanation! I was following the tutorial here. It will be nice if you can make this difference explicit in the tutorial. Thanks again for maintaining this amazing library!
What happened + What you expected to happen
Summary:
While performing single-step forecasting for a single time series data and evaluating model performance using cross-validation, I observed a mismatch in RMSE results between Scikit-Learn and MLForecast. Other metrics like MSE and MAE matched fine.
Steps to Reproduce:
Scikit-Learn Approach:
TimeSeriesSplit
andcross_val_score
for cross-validation with a window size of 12 and test size of 1.MLForecast Approach:
MLForecast.cross_validation
andutilsforecast.evaluation.evaluate
.Issue:
Scikit-Learn calculates RMSE for each fold individually, then aggregates the results. In contrast,
evaluate()
in MLForecast combines predictions from all folds into a single dataset to calculate RMSE. This discrepancy leads to different RMSE values between the two frameworks.Potential Impact:
This behavior might not be a bug, but it can cause confusion if users expect fold-wise RMSE calculations. Users of
cross_validation()
+evaluate()
could be misled if this distinction isn’t explicitly documented.Recommendation:
Explicitly document how
utilsforecast.evaluation.evaluate
calculates RMSE, especially when used withcross_validation()
. This will help users interpret results correctly and avoid potential misalignment with other frameworks.Versions / Dependencies
Python 3.10.15
scikit-learn 1.5.2
mlforecast 1.0.1
utilsforecast 0.2.11
Reproduction script
Scikit-learn Code:
MLForecast Code:
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: