You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
do you by any chance still have the dataset split (train/val/test set) that was used to pretrain ProtT5 UniRef50? I am trying to investigate data leakage for down stream tasks.
The text was updated successfully, but these errors were encountered:
speydril
changed the title
What was the training split for the ProtT5-UniRef50 model?
What was the pretraining split for the ProtT5-UniRef50 model?
Jun 12, 2024
Hi, no, unfortunately, we do not have datasplits for this anymore as we considered the downstream prediction performance the acid test. Looking back, this was obviously a mistake.
In order to still move forward on your end, you could take a time-cut-off of UniRef, i.e., extracting all sequences published after ProtT5 training, and redundancy reduce the newly added sequences against our training set (which will be a pain, sorry, as we also trained on BFD ... ).
do you by any chance still have the dataset split (train/val/test set) that was used to pretrain ProtT5 UniRef50? I am trying to investigate data leakage for down stream tasks.
The text was updated successfully, but these errors were encountered: