Skip to content

[Hotfix] CV unbalanced subsampling#78

Merged
itellaetxe merged 4 commits intomainfrom
hotfix_cv_unbalanced_subsampling
Sep 10, 2025
Merged

[Hotfix] CV unbalanced subsampling#78
itellaetxe merged 4 commits intomainfrom
hotfix_cv_unbalanced_subsampling

Conversation

@itellaetxe
Copy link
Copy Markdown
Contributor

Fixes #76

When training a logistic regression classifier for 2 unbalanced classes using age deltas or features (or whatever), we use majority class subsampling. We used to do it before going into the Cross-Validation procedure, leading to potential optimistic biases and underutilization of the data.

Now we put all the data available in the CV, and we undersample the majority class for each CV fold, avoiding the described problem.

Copy link
Copy Markdown
Contributor

@JGarciaCondado JGarciaCondado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to always subsampling, not based on a ratio. Open an Issue saying that the user should be allowed to change this if they want. Explain why the specific seed was set.

Comment thread src/ageml/modelling.py
Comment thread src/ageml/modelling.py
Comment thread src/ageml/modelling.py
Comment thread src/ageml/ui.py Outdated
Comment thread src/ageml/ui.py Outdated
@itellaetxe itellaetxe merged commit 32d4ec4 into main Sep 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nesting of sampling in classifier

2 participants