Skip to content

Model feature influence#73

Merged
itellaetxe merged 21 commits intomainfrom
model-feature-influence
Apr 28, 2025
Merged

Model feature influence#73
itellaetxe merged 21 commits intomainfrom
model-feature-influence

Conversation

@JGarciaCondado
Copy link
Copy Markdown
Contributor

@JGarciaCondado JGarciaCondado commented Apr 2, 2025

Summary

We have added two new methods to the ageml processing pipeline. The new commands are:

  • model_feature_influence
  • age_model_vs_logistic_regression

Description of commands

model_feature_influence:

  • Takes as input features, clinical file and two groups.
  1. Orders the N features given according to their Mutual Information (MI) with age and the MI discriminative power between the two clinical groups.
  2. Creates N models trained with different feature sets according to the features' MI with age. The first model is trained with the best feature according to the MI with age. Then the second best feature is added and so on...
  3. Creates N models trained with different feature sets according to the features' MI with their discriminative power. The first model is trained with the best feature according to the MI with their discriminative power. Then the second best feature is added and so on...

Output:

  • Order of features according to MI with age, and MI with discriminative power
  • Graph of the features' MI with age in the x-axis and MI with discriminative power in the y-axis
  • Graph of MAE according to the number of features used.
  • Graph of AUC to classify the two groups according to the number of features used.
  • Graph of MAE and AUC for both the age-MI and discriminative-MI models.

metrics_vs_num_features_age_cn_ad
metrics_vs_num_features_cn_ad
metrics_vs_num_features_discrimination_cn_ad
ordering_cn_ad

age_model_vs_logistic_regression

  • Takes as input a features file, a clinical file, and two groups.
  1. Repeats the process above. However it trains three different BrainAge models: Linear Regressor, Ridge and SVM.
  2. We are interested in the AUC only in classifying.
  3. Trains a logisitic regressor directly with the features, then computes the AUC.
  4. Also trains 4 logisitic regressors using: only the features, [features + age], only the delta, [features + delta]

This pipeline gives you an idea of the benefit of computing the deltas to classify clinical groups, compared to just using the features.

Output:

  • AUC graph for each of the 4 models (3 Brain Age models and 1 using features)
  • AUC values for the 4 logisitic regressors.

auc_vs_num_features_age_cn_ad
auc_vs_num_features_discrimination_cn_ad

@JGarciaCondado JGarciaCondado linked an issue Apr 2, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@itellaetxe itellaetxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor stuff:

  • Argparser nargs comments
  • Inheritance for ...FoldMetrics
  • Docstrings should be in the same format, they are not consistent.

Comment thread src/ageml/commands.py
self.parser.add_argument(
"-m",
"--model",
nargs="*",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why nargs="*"? (nargs doc)
Can you specify more than one model? Or you just want to put the specified model string into a list? E.g.-> you specify "ridge" and it automatically gets put into ["ridge"]

If you only want one or zero arguments, you should instead change this to nargs="?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is inherited from previous code above, see model_age. When building the models the user can choose to use one of the availble models (only one). However, they can specify several other arguments to set in the model. Example: -m linear_reg fit_intercept=False normalize=True. As the user can input no arguments (default uses) or one argument (model type that will use the default settings for tha model type) or muliple arguments which woul require then *. If it is more appropriate to use ? we should change it above too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay seems reasonable! nargs="*" is good for our use case, then. 👍

Comment thread src/ageml/commands.py
self.parser.add_argument(
"-s",
"--scaler",
nargs="*",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about nargs. Same applies to --scaler argument here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replied in the commment above and its the same case it can have 0, 1 or muliple arguments.

Comment thread src/ageml/processing.py Outdated
raise ValueError('task_type must be either "regression" or "classification"')
else:
self.task_type = task_type
self.train_metrics: List[Union[RegressionFoldMetrics, ClassificationFoldMetrics]] = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this List and Union thing, use class inheritance.

Making RegressionFoldMetrics and ClassificationFoldMetrics children of the same abstract parent class e.g. FoldMetrics would be easier. This way you just have to check if the parent class of the input is of type FoldMetrics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this has been changed.

Comment thread src/ageml/processing.py Outdated
self.test_metrics.append(fold_test)

def _calculate_summary(self, metrics_list: List[Union[RegressionFoldMetrics, ClassificationFoldMetrics]]) -> Dict[str, Dict[str, float]]:
# TODO - Automatically infer instead of using Union to have both types
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about inheritance with FoldMetrics

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Comment thread src/ageml/visualizer.py Outdated
plt.close()

def ordering(self, mi_age, mi_discr, feature_names, system_dict, title):
"""Plot in the ssame figure the mutual information for age and discrimination."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete docstring

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now updated the docstring.

Comment thread src/ageml/visualizer.py
plt.close()

def multiple_metrics_vs_num_features(self, metrics_age, metrics_discrimination, title):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been added

Comment thread tests/test_ageml/test_modelling.py
Comment thread tests/test_ageml/test_processing.py
Comment thread src/ageml/visualizer.py
@JGarciaCondado JGarciaCondado force-pushed the model-feature-influence branch 3 times, most recently from 33c2235 to 5b53958 Compare April 24, 2025 18:55
Copy link
Copy Markdown
Contributor

@itellaetxe itellaetxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging. Ty for the work @JGarciaCondado 🤝

Comment thread src/ageml/commands.py
self.parser.add_argument(
"-m",
"--model",
nargs="*",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay seems reasonable! nargs="*" is good for our use case, then. 👍

@itellaetxe itellaetxe merged commit 047cf21 into main Apr 28, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model Feature Influence

2 participants