Skip to content

Comments

Overhaul of Adeft validation#83

Closed
steppi wants to merge 28 commits intogyorilab:v1.0.0-devfrom
steppi:validation
Closed

Overhaul of Adeft validation#83
steppi wants to merge 28 commits intogyorilab:v1.0.0-devfrom
steppi:validation

Conversation

@steppi
Copy link
Collaborator

@steppi steppi commented Nov 6, 2025

This PR makes an overhaul of validation in adeft it

  • Uses a proper nested cross validation scheme to isolate model selection from validation.
  • Uses pooled F1 for model selection. This means of computing F1 across cross validation folds is recommended in the classic Apples to Apples paper by Forman and Scholz (https://dl.acm.org/doi/10.1145/1882471.1882479).
  • For validation results, reports classwise sensitivity and specificity, and the full confusion matrix, rather than per class F1 and global F1. F1 wasn't actually a very good metric to report because the class balance in the training data doesn't reflect the actual class balance. Sensitivity and specificity are independent of the unknown class balance, so still readily interpretable. Future work will go into better understanding the class balance in unseen data.
  • Corresponding updates had to be made for model serialization, info reporting in the AdeftDisambiguator and tests.

This PR is a work in progress. The tests are not going to pass until I populate the 1.0.0-dev folder on S3 with testing resources. I'm just posting this for the sake of visibility. Note also that this is a PR into a v1.0.0-dev branch. I may back down from committing to a stable 1.0.0 release, but it seems like a worthy goal to shoot for.

@steppi
Copy link
Collaborator Author

steppi commented Dec 15, 2025

Shelving this for now in favor of a more incremental approach to more quickly get us where we need to be for the OPAQUE paper.

@steppi steppi closed this Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant