Conversation
| from typing import Dict, List, Optional, Tuple, Union | ||
| from typing import Dict, List, Optional, Tuple, Union, Any | ||
|
|
||
| # temp code |
| def _calculate_feature_imports( | ||
| self, | ||
| model: Estimator, | ||
| X_val: Union[pd.DataFrame, np.ndarray], |
There was a problem hiding this comment.
Are these definitely the correct types? In what case would it be pd.DataFrame?
| """Computes feature importance for a single model using the specified method. | ||
|
|
||
| Args: | ||
| method: Importance calculation method ('shap' or 'Catboost_feature_importance'). |
There was a problem hiding this comment.
There is no such a parameter
| np.abs(arr[:max_rows]) for arr in feature_importances_per_model | ||
| ] | ||
| stacked = np.stack(padded_arrays, axis=0) | ||
| mean_values = stacked.mean(axis=(0, 1)).round(4) |
| self.feature_importances_per_model = [] | ||
| self.feature_name = pipeline.output_features[::-1] |
There was a problem hiding this comment.
Initializations of all attributes should be in init.
| self.features_argsort = np.argsort(pipeline.output_features) | ||
| X = X[:, self.features_argsort] | ||
|
|
||
| return_importance = self.return_importance |
There was a problem hiding this comment.
Why you need this line? You can use self.return_importance...
| y_pred = np.mean(models_preds, axis=0) | ||
|
|
||
| if "test" not in self.shap_values: | ||
| self.shap_values["test"] = {} |
There was a problem hiding this comment.
There seems to be inconsistency in the attribute names:
self.feature_importances_per_model
self.shap_values
Perhaps a better naming and storage logic could be considered?
For example:
return_importance -> return shaps (we have only one type of importances now)
self.shap_values = {
“feature_names”: [...],
“model_0” : {
“train_fold_0”: {...},
...,
“train_fold_k”: {...},
“test”: {...}
...
“model_n”: {...}
}
| if return_explainer: | ||
| return arr_explainers | ||
|
|
||
| def get_train_shap(self) -> dict: |
There was a problem hiding this comment.
why you need this function?
|
|
||
| return dataset | ||
|
|
||
| def get_feature_importance( |
There was a problem hiding this comment.
Can we bring all the logic of aggregating to the trainer level?
Keep this feature here, but give it less functionality.
and it is better to separate visualization and calculation of numerical values
There was a problem hiding this comment.
Also you have too much repeating code between strategies. Why?
Can it be inherited?
New datasets added:
tsururu/datasets/global/AirPassengers.csv– dataset with real airline passenger datatsururu/datasets/global/simulated_season_data.csv– dataset with synthetic seasonal time seriesUpdated tutorial
tsururu/examples/Tutorial_1_Quick_start.ipynb:Modified
tsururu/tsururu/model_training/trainer.py. Now inMLTrainer, we can calculate feature importance:fitmethod, helper functions_calculate_feature_importsandaggregate_feature_importanceare called, and the result is saved in theself.shap_valuesfield."compute_test_shap": Trueinfeature_explainer_params.get_feature_importance_plots:Changes After Fixes
Updated
tsururu/examples/Tutorial_1_Quick_start.ipynbstrategyAdded
tsururu/examples/Tutorial_5_Feature_analysis.ipynbtsururufeature_importanceextraction from base CatBoost modelModified
tsururu/tsururu/strategies/direct.py,tsururu/tsururu/strategies/flat_wide_mimo.py,tsururu/tsururu/strategies/mimo.py,tsururu/tsururu/strategies/recursive.pyget_feature_importancemethod (logic migrated from oldtsururu/tsururu/model_training/trainer.py)get_train_shapandget_test_shapmethods for easy SHAP value extractionModified
tsururu/tsururu/model_training/trainer.pyget_feature_importancelogic (moved tostrategylevel)Added
tsururu/tests/test_feature_analysis/test_all_strategy.pyget_feature_importance,get_train_shap,get_test_shap)Changes After Fixes 2
Update
tsururu/tests/test_feature_analysis/test_all_strategy.pyfitandpredictmethodsUpdate
tsururu/tsururu/strategies/direct.py,tsururu/tsururu/strategies/flat_wide_mimo.py,tsururu/tsururu/strategies/mimo.py,tsururu/tsururu/strategies/recursive.pyget_feature_importancemethod into three methods: one for aggregation and two for plotting. These were not moved totrainer, since the logic differs across strategiesUpdate
examples/Tutorial_5_Feature_analysis.ipynb