Hi, I looked into the OOD results and many examples in the test sets seem to be in the train set. E.g. DengueFilipino has the same train and test set. KirundiNews has 90% overlap...
to reproduce:
from data import *
dataloaders = dict(DengueFilipino=load_filipino,
KirundiNews=load_kirnews,
KinyarwandaNews=load_kinnews,
SwahiliNews=load_swahili)
for data_name, loader in dataloaders.items():
train, test = loader();
overlap = 1 - len(set(test) - set(train)) / len(set(test))
print(data_name, f"train<->test overlap: {overlap * 100:.1f}%")
DengueFilipino train<->test overlap: 100.0%
KirundiNews train<->test overlap: 90.4%
KinyarwandaNews train<->test overlap: 23.8%
SwahiliNews train<->test overlap: 0.5%
Hi, I looked into the OOD results and many examples in the test sets seem to be in the train set. E.g. DengueFilipino has the same train and test set. KirundiNews has 90% overlap...
to reproduce: