Skip to content

Feature/shap#35

Open
dimaattt wants to merge 24 commits intomainfrom
feature/shap
Open

Feature/shap#35
dimaattt wants to merge 24 commits intomainfrom
feature/shap

Conversation

@dimaattt
Copy link
Copy Markdown

@dimaattt dimaattt commented Feb 18, 2026

New datasets added:

  • tsururu/datasets/global/AirPassengers.csv – dataset with real airline passenger data
  • tsururu/datasets/global/simulated_season_data.csv – dataset with synthetic seasonal time series

Updated tutorial tsururu/examples/Tutorial_1_Quick_start.ipynb:

  1. Replaced the synthetic time series with the new AirPassengers real data.
  2. Added helper functions for plotting the time series graph, ACF, and PACF.
  3. Added cells demonstrating feature importance calculation functionality.

Modified tsururu/tsururu/model_training/trainer.py. Now in MLTrainer, we can calculate feature importance:

  1. In the fit method, helper functions _calculate_feature_imports and aggregate_feature_importance are called, and the result is saved in the self.shap_values field.
  2. It's possible to compute SHAP on the test set by setting "compute_test_shap": True in feature_explainer_params.
  3. Feature importance plots can be generated by calling get_feature_importance_plots:
    • boxplot — non-aggregated SHAP values
    • barplot — aggregated feature importance

Changes After Fixes

Updated tsururu/examples/Tutorial_1_Quick_start.ipynb

  • Demonstrated new methods in strategy
  • Added time series plots with explanations

Added tsururu/examples/Tutorial_5_Feature_analysis.ipynb

  • Explained SHAP computation logic with formulas
  • Showed how to compute SHAP values in tsururu
  • Demonstrated feature_importance extraction from base CatBoost model
  • Added interactive plots using shap library

Modified tsururu/tsururu/strategies/direct.py, tsururu/tsururu/strategies/flat_wide_mimo.py, tsururu/tsururu/strategies/mimo.py, tsururu/tsururu/strategies/recursive.py

  • Added get_feature_importance method (logic migrated from old tsururu/tsururu/model_training/trainer.py)
  • Added get_train_shap and get_test_shap methods for easy SHAP value extraction

Modified tsururu/tsururu/model_training/trainer.py

  • Removed get_feature_importance logic (moved to strategy level)
  • Refactored code to use single feature importance method

Added tsururu/tests/test_feature_analysis/test_all_strategy.py

  • Tests new methods (get_feature_importance, get_train_shap, get_test_shap)
  • Coverage for all strategy types
  • Run with:
pytest tests/test_feature_analysis/test_all_strategy.py -v

Changes After Fixes 2

Update tsururu/tests/test_feature_analysis/test_all_strategy.py

  • Removed tests for fit and predict methods
  • Kept only the test for the pipeline and SHAP value computation

Update tsururu/tsururu/strategies/direct.py, tsururu/tsururu/strategies/flat_wide_mimo.py, tsururu/tsururu/strategies/mimo.py, tsururu/tsururu/strategies/recursive.py

  • Split the get_feature_importance method into three methods: one for aggregation and two for plotting. These were not moved to trainer, since the logic differs across strategies
  • Implemented proper inheritance between strategies to reduce code duplication
  • Reworked boxplot generation logic: now plots are linked to their corresponding trainer, and no more than three plots are shown per row

Update examples/Tutorial_5_Feature_analysis.ipynb

  • Updated cells related to boxplot visualization according to the new logic

@dimaattt dimaattt requested a review from elineii March 16, 2026 07:07
Comment thread tsururu/model_training/trainer.py Outdated
from typing import Dict, List, Optional, Tuple, Union
from typing import Dict, List, Optional, Tuple, Union, Any

# temp code
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del

Comment thread tsururu/model_training/trainer.py Outdated
def _calculate_feature_imports(
self,
model: Estimator,
X_val: Union[pd.DataFrame, np.ndarray],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these definitely the correct types? In what case would it be pd.DataFrame?

Comment thread tsururu/model_training/trainer.py Outdated
"""Computes feature importance for a single model using the specified method.

Args:
method: Importance calculation method ('shap' or 'Catboost_feature_importance').
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such a parameter

Comment thread tsururu/model_training/trainer.py Outdated
np.abs(arr[:max_rows]) for arr in feature_importances_per_model
]
stacked = np.stack(padded_arrays, axis=0)
mean_values = stacked.mean(axis=(0, 1)).round(4)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rounding here?

Comment thread tsururu/model_training/trainer.py Outdated
Comment on lines +194 to +195
self.feature_importances_per_model = []
self.feature_name = pipeline.output_features[::-1]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializations of all attributes should be in init.

Comment thread tsururu/model_training/trainer.py Outdated
self.features_argsort = np.argsort(pipeline.output_features)
X = X[:, self.features_argsort]

return_importance = self.return_importance
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need this line? You can use self.return_importance...

y_pred = np.mean(models_preds, axis=0)

if "test" not in self.shap_values:
self.shap_values["test"] = {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be inconsistency in the attribute names:

self.feature_importances_per_model
self.shap_values

Perhaps a better naming and storage logic could be considered?

For example:
return_importance -> return shaps (we have only one type of importances now)

self.shap_values = {
“feature_names”: [...],
“model_0” : {
“train_fold_0”: {...},
...,
“train_fold_k”: {...},
“test”: {...}
...
“model_n”: {...}
}

Comment thread tsururu/strategies/direct.py Outdated
if return_explainer:
return arr_explainers

def get_train_shap(self) -> dict:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need this function?


return dataset

def get_feature_importance(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bring all the logic of aggregating to the trainer level?
Keep this feature here, but give it less functionality.

and it is better to separate visualization and calculation of numerical values

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you have too much repeating code between strategies. Why?

Can it be inherited?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants