Overhaul of Adeft validation by steppi · Pull Request #83 · gyorilab/adeft

steppi · 2025-11-06T20:56:20Z

This PR makes an overhaul of validation in adeft it

Uses a proper nested cross validation scheme to isolate model selection from validation.
Uses pooled F1 for model selection. This means of computing F1 across cross validation folds is recommended in the classic Apples to Apples paper by Forman and Scholz (https://dl.acm.org/doi/10.1145/1882471.1882479).
For validation results, reports classwise sensitivity and specificity, and the full confusion matrix, rather than per class F1 and global F1. F1 wasn't actually a very good metric to report because the class balance in the training data doesn't reflect the actual class balance. Sensitivity and specificity are independent of the unknown class balance, so still readily interpretable. Future work will go into better understanding the class balance in unseen data.
Corresponding updates had to be made for model serialization, info reporting in the AdeftDisambiguator and tests.

This PR is a work in progress. The tests are not going to pass until I populate the 1.0.0-dev folder on S3 with testing resources. I'm just posting this for the sake of visibility. Note also that this is a PR into a v1.0.0-dev branch. I may back down from committing to a stable 1.0.0 release, but it seems like a worthy goal to shoot for.

- this should always require retraining

steppi · 2025-12-15T17:45:30Z

Shelving this for now in favor of a more incremental approach to more quickly get us where we need to be for the OPAQUE paper.

steppi added 28 commits November 6, 2025 15:40

Add pooled fbeta grid search cv

a16d37f

Check in work in progress validation

ee40133

Add todo comment

f294ae7

Add refit option to validate

92e8c38

Add utility functions to serialize and deserialize arrays

7103e7d

Use more principled validation pipeline

3cd2ff8

Use npz for array serialization instead of jsonified lists

cfc6e0d

Revamp model serialization/deserialization

3bf753f

Ensure fbeta scores stored as floats for easy serialization

48bff70

Remove unnecessary warning filter

e4d2fca

Remove unused imports

52dddfe

Fix bugs in validation pipeline

eb33d35

Clean up model serialization

8e086fc

Update test_classify for new changes

31c8808

Use new load_model class method for AdeftClassifier

40bdbaa

Bump development version

62cb1bd

Update documentation for updated AdeftClassifier

a9b65e4

Get rid of update_pos_labels

703e00b

- this should always require retraining

Update to disambiguator info

62f3d3b

Update import

f943e42

Bump minimal python version

567b7d1

Remove parts dealing with stats from modify_groundings

82b77a4

Improve disambiguator report

78ab40f

Stop talking about F1 score in a place it isn't relevant

a1faf90

Fix malloc and free issues in scorer

4bb8e07

Fix another memory leak

263b2bf

Fix another memory leak

37f0d27

Check in current work in progress

f06635c

steppi closed this Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Overhaul of Adeft validation#83

Overhaul of Adeft validation#83
steppi wants to merge 28 commits intogyorilab:v1.0.0-devfrom
steppi:validation

steppi commented Nov 6, 2025

Uh oh!

steppi commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

steppi commented Nov 6, 2025

Uh oh!

steppi commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant