You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in a recent meeting (and reported at https://github.com/clamsproject/aapb-collaboration/issues/27#issuecomment-2501578284), we might need to split 27-e batch that currently contains two sets of GUIDs from "challenging images" annotation (namely bm set and pbd set) so that we can full reserve pbd set as the "test" set moving forward, that can be fed to scripted evaluation process without additional manual intervention.
However, on the training side, right now I use pbd set as a validation set during training SWT models for 7.1 release. And keep 27-d batch for testing both timepoint-wise and timeframe-wise, as the 27-d batch is annotated (unlike pbd set) in the "sequentially" manner that enables natural conversion from point annotations to timeframe annotations. Re-purposing pbd set to be the test set for the future might imply that we need1 either
a new validation set (a.k.a "dev" set that's used for evaluation-during-training for hyperparameter tuning)
to go back to k-fold validation
And I want to hear any preferences, other suggestion, or any idea about this, especially @owencking and @marcverhagen.
On second thought, maybe I misunderstood our discussion during the meeting yesterday, and it was indeed about keeping the validation set (pbd) from being re-used in the training after hparam tuning. If that's the case, there's no need to re-configure the data split, I think (but we should expect a bit of loss in model performance due to smaller training examples).
As discussed in a recent meeting (and reported at https://github.com/clamsproject/aapb-collaboration/issues/27#issuecomment-2501578284), we might need to split
27-e
batch that currently contains two sets of GUIDs from "challenging images" annotation (namelybm
set andpbd
set) so that we can full reservepbd
set as the "test" set moving forward, that can be fed to scripted evaluation process without additional manual intervention.However, on the training side, right now I use
pbd
set as a validation set during training SWT models for 7.1 release. And keep27-d
batch for testing both timepoint-wise and timeframe-wise, as the27-d
batch is annotated (unlikepbd
set) in the "sequentially" manner that enables natural conversion from point annotations to timeframe annotations. Re-purposingpbd
set to be the test set for the future might imply that we need1 eitherAnd I want to hear any preferences, other suggestion, or any idea about this, especially @owencking and @marcverhagen.
Footnotes
https://datascience.stackexchange.com/questions/18339/why-use-both-validation-set-and-test-set ↩
The text was updated successfully, but these errors were encountered: