Feature/shap by dimaattt · Pull Request #35 · sb-ai-lab/tsururu

dimaattt · 2026-02-18T13:11:35Z

New datasets added:

tsururu/datasets/global/AirPassengers.csv – dataset with real airline passenger data
tsururu/datasets/global/simulated_season_data.csv – dataset with synthetic seasonal time series

Updated tutorial tsururu/examples/Tutorial_1_Quick_start.ipynb:

Replaced the synthetic time series with the new AirPassengers real data.
Added helper functions for plotting the time series graph, ACF, and PACF.
Added cells demonstrating feature importance calculation functionality.

Modified tsururu/tsururu/model_training/trainer.py. Now in MLTrainer, we can calculate feature importance:

In the fit method, helper functions _calculate_feature_imports and aggregate_feature_importance are called, and the result is saved in the self.shap_values field.
It's possible to compute SHAP on the test set by setting "compute_test_shap": True in feature_explainer_params.
Feature importance plots can be generated by calling get_feature_importance_plots:
- boxplot — non-aggregated SHAP values
- barplot — aggregated feature importance

Changes After Fixes

Updated tsururu/examples/Tutorial_1_Quick_start.ipynb

Demonstrated new methods in strategy
Added time series plots with explanations

Added tsururu/examples/Tutorial_5_Feature_analysis.ipynb

Explained SHAP computation logic with formulas
Showed how to compute SHAP values in tsururu
Demonstrated feature_importance extraction from base CatBoost model
Added interactive plots using shap library

Modified tsururu/tsururu/strategies/direct.py, tsururu/tsururu/strategies/flat_wide_mimo.py, tsururu/tsururu/strategies/mimo.py, tsururu/tsururu/strategies/recursive.py

Added get_feature_importance method (logic migrated from old tsururu/tsururu/model_training/trainer.py)
Added get_train_shap and get_test_shap methods for easy SHAP value extraction

Modified tsururu/tsururu/model_training/trainer.py

Removed get_feature_importance logic (moved to strategy level)
Refactored code to use single feature importance method

Added tsururu/tests/test_feature_analysis/test_all_strategy.py

Tests new methods (get_feature_importance, get_train_shap, get_test_shap)
Coverage for all strategy types
Run with:

pytest tests/test_feature_analysis/test_all_strategy.py -v

Changes After Fixes 2

Update tsururu/tests/test_feature_analysis/test_all_strategy.py

Removed tests for fit and predict methods
Kept only the test for the pipeline and SHAP value computation

Update tsururu/tsururu/strategies/direct.py, tsururu/tsururu/strategies/flat_wide_mimo.py, tsururu/tsururu/strategies/mimo.py, tsururu/tsururu/strategies/recursive.py

Split the get_feature_importance method into three methods: one for aggregation and two for plotting. These were not moved to trainer, since the logic differs across strategies
Implemented proper inheritance between strategies to reduce code duplication
Reworked boxplot generation logic: now plots are linked to their corresponding trainer, and no more than three plots are shown per row

Update examples/Tutorial_5_Feature_analysis.ipynb

Updated cells related to boxplot visualization according to the new logic

…iner

elineii · 2026-03-13T09:37:46Z

-from typing import Dict, List, Optional, Tuple, Union
+from typing import Dict, List, Optional, Tuple, Union, Any
+
+# temp code


elineii · 2026-03-13T09:40:53Z

+    def _calculate_feature_imports(
+        self,
+        model: Estimator,
+        X_val: Union[pd.DataFrame, np.ndarray],


Are these definitely the correct types? In what case would it be pd.DataFrame?

elineii · 2026-03-13T09:41:25Z

+        """Computes feature importance for a single model using the specified method.
+
+        Args:
+            method: Importance calculation method ('shap' or 'Catboost_feature_importance').


There is no such a parameter

elineii · 2026-03-13T09:42:37Z

+                np.abs(arr[:max_rows]) for arr in feature_importances_per_model
+            ]
+            stacked = np.stack(padded_arrays, axis=0)
+            mean_values = stacked.mean(axis=(0, 1)).round(4)


Why rounding here?

elineii · 2026-03-13T09:45:48Z

+        self.feature_importances_per_model = []
+        self.feature_name = pipeline.output_features[::-1]


Initializations of all attributes should be in init.

elineii · 2026-03-13T09:46:22Z

        self.features_argsort = np.argsort(pipeline.output_features)
        X = X[:, self.features_argsort]

+        return_importance = self.return_importance


Why you need this line? You can use self.return_importance...

elineii · 2026-03-13T09:53:50Z

-        y_pred = np.mean(models_preds, axis=0)

+        if "test" not in self.shap_values:
+            self.shap_values["test"] = {}


There seems to be inconsistency in the attribute names:

self.feature_importances_per_model
self.shap_values

Perhaps a better naming and storage logic could be considered?

For example:
return_importance -> return shaps (we have only one type of importances now)

self.shap_values = {
“feature_names”: [...],
“model_0” : {
“train_fold_0”: {...},
...,
“train_fold_k”: {...},
“test”: {...}
...
“model_n”: {...}
}

elineii · 2026-03-13T09:55:08Z

+        if return_explainer:
+            return arr_explainers
+
+    def get_train_shap(self) -> dict:


why you need this function?

elineii · 2026-03-13T09:57:19Z


        return dataset
+
+    def get_feature_importance(


Can we bring all the logic of aggregating to the trainer level?
Keep this feature here, but give it less functionality.

and it is better to separate visualization and calculation of numerical values

Also you have too much repeating code between strategies. Why?

Can it be inherited?

dimaattt added 18 commits February 12, 2026 17:41

Add cell with shap/CatBoost expainer

30f3b65

adding a synthetic seasonal series

231fc89

add functionality for calculating the importance of features in MLTra…

7132aa0

…iner

add dataset with real data

1d1e43b

add real data

a928c9d

Add cell with shap/CatBoost expainer

4ddc16c

fix bags, add calculate shap

0708a05

fix bags, add new cell

7b7a7d0

fix bags

135f798

bug fix

4d7869d

add aggregeted by strategies test

1c95b17

fixed minor bugs in the code, fixed the text

2bd7e65

try to output js graphs

ebf733f

try2 fix js plots

a9144b9

fix code and text

25f0366

remove old tests

bf08101

fix method and dockstring updated

7b31f1d

fix method and dockstring updated

616ad35

dimaattt requested a review from elineii March 16, 2026 07:07

elineii requested changes Mar 16, 2026

View reviewed changes

dimaattt added 6 commits March 16, 2026 16:18

update plots in some cell

3b82d55

remove extra tests

4536aba

fix bag

4218a31

fix bag

6b99f6d

divide logic of get_feature_importance

9cd8c7d

divide logic of get_feature_importance and layering some methods

f84b923

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/shap#35

Feature/shap#35
dimaattt wants to merge 24 commits intomainfrom
feature/shap

dimaattt commented Feb 18, 2026 •

edited

Loading

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

elineii Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self.feature_importances_per_model = []
		self.feature_name = pipeline.output_features[::-1]

Conversation

dimaattt commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes After Fixes

Changes After Fixes 2

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dimaattt commented Feb 18, 2026 •

edited

Loading