Runs | NDCG@10 | P@10 | RPrec | MRR |
---|---|---|---|---|
BM25, k1=0.82, b=0.68 | 0.3395 | 0.4520 | 0.1892 | 0.6942 |
ANCE | 0.1052 | 0.1280 | 0.0541 | 0.3017 |
Hybrid, BM25+ANCE, a=2 (grid search from a=1 to 10) |
0.3488 | 0.4667 | 0.1883 | 0.7208 |
Runs | NDCG@10 | P@10 | RPrec | MRR | Recall@1000 |
---|---|---|---|---|---|
TREC BEST (frocchio_monot5_e) | 0.6125 | 0.6780 | 0.3652 | 0.8519 | 0.6765 |
a. BM25, k1=0.82, b=0.68 | 0.3103 | 0.3760 | 0.1715 | 0.6481 | 0.3862 |
b. ANCE | 0.0909 | 0.0960 | 0.0360 | 0.2168 | 0.1018 |
c. DPR(BERT) + 10k chatGPT data | 0.2186 | 0.2420 | 0.1055 | 0.4629 | 0.2646 |
d. DPR(PubmedBERT) + 10k chatGPT data | 0.3481 | 0.4000 | 0.2409 | 0.5871 | 0.5391 |
e. DPR(PubmedBERT) + 20k chatGPT data | 0.3372 | 0.3820 | 0.2112 | 0.6018 | 0.4563 |
f. DPR(PubmedBERT) + 20k chatGPT data + 5k labelled data | 0.4037 | 0.4500 | 0.2409 | 0.6418 | 0.4982 |
g. Hybrid a + f, alpha=0.9 | 0.4937 | 0.5500 | 0.2895 | 0.7912 | 0.5901 |
h. DPR(PubmedBERT) + 20k chatGPT data + 5k labelled data + Hard negatives | 0.4096 | 0.4840 | 0.2711 | 0.6693 | 0.5932 |
i. Hybrid a + h, alpha=0.8 | 0.4819 | 0.5620 | 0.2954 | 0.7391 | 0.5930 |
j. SPLADE(PubmedBERT) + 20k chatGPT data + 5k labelled data | 0.3729 | 0.4120 | 0.2257 | 0.6180 | 0.5196 |
k. Hybrid h + j, alpha=0.8 | 0.4746 | 0.5460 | 0.3071 | 0.6949 | 0.6369 |
k. k + gpt-3.5-turbo setwise.n=3 rerank 100 | 0.4934 | 0.5760 | - | 0.7569 | - |
Runs | NDCG@10 | P@10 | RPrec | MRR | Recall@1000 |
---|---|---|---|---|---|
TREC best (frocchio_monot5_e) | 0.6125 | 0.5080 | 0.3297 | 0.7262 | 0.7396 |
TREC Second best (DoSSIER_5) | 0.5565 | 0.4560 | 0.2434 | 0.6191 | 0.6239 |
TREC third best (iiia-unipd, manual run) | 0.5051 | 0.3980 | 0.2790 | 0.6085 | - |
BM25, k1=0.82, b=0.68 | 0.3103 | 0.2120 | 0.1191 | 0.4126 | 0.3663 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. DR(PubmedBERT) + 20k chatGPT data + 5k labelled data (ckpt6000) | 0.4037 | 0.3260 | 0.2158 | 0.5741 | 0.5551 |
b. DR(PubmedBERT + CT MLM) + 20k chatGPT data + 5k labelled data (ckpt3000) | 0.4072 | 0.3280 | 0.2194 | 0.6392 | 0.5992 |
c. DR(PubmedBERT + CT MLM) + 20k chatGPT data + 5k labelled data + HN (ckpt3000) | 0.4271 | 0.3240 | 0.2274 | 0.4826 | 0.5987 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. SPLADE(PubmedBERT) + 20k chatGPT data + 5k labelled data (ckpt10000) | 0.3729 | 0.3020 | 0.1975 | 0.5339 | 0.5764 |
b. SPLADE(PubmedBERT + CT MLM) + 20k chatGPT data + 5k labelled data (ckpt12000) | 0.3512 | 0.2920 | 0.1854 | 0.4964 | 0.5576 |
c. SPLADE(PubmedBERT) + 20k chatGPT data + 5k labelled data +HN (ckpt16000) | 0.4235 | 0.3280 | 0.2341 | 0.5374 | 0.5968 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. Hybrid, DR c + SPLADE c, alpha=0.5 | 0.5024 | 0.3800 | 0.2612 | 0.5884 | 0.6529 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt2000, rerank hybrid a top1000 | 0.5614 | 0.4280 | 0.2812 | 0.7009 | 0.6529 |
b. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt3000, rerank hybrid a top1000 | 0.5804 | 0.4400 | 0.2915 | 0.7427 | 0.6529 |
c. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt4000, rerank hybrid a top1000 | 0.5977 | 0.4560 | 0.3069 | 0.7154 | 0.6529 |
d. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt5000, rerank hybrid a top1000 | 0.6055 | 0.4660 | 0.3121 | 0.7407 | 0.6529 |
e. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt8000, rerank hybrid a top1000 | 0.6064 | 0.4740 | 0.3069 | 0.7131 | 0.6529 |
f. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt9000, rerank hybrid a top1000 | 0.6090 | 0.4800 | 0.3107 | 0.7063 | 0.6529 |
g. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt10000, rerank hybrid a top1000 | 0.5982 | 0.4640 | 0.3018 | 0.7160 | 0.6529 |
h. Cross-encoder (PubmedBERT large), HN from hybrid a, ckpt11000, rerank hybrid a top1000 | 0.6006 | 0.4620 | 0.3012 | 0.7114 | 0.6529 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. Hybrid DR c + SPLADE c + GPT-3.5-turbo judger rerank top20 | 0.5254 | 0.4120 | 0.2641 | 0.6225 | 0.6529 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. Hybrid DR c + SPLADE c + Cross-encoder f, alpha=0.1 | 0.6209 | 0.4880 | 0.3109 | 0.7545 | 0.6529 |
b. Hybrid DR c + SPLADE c + Cross-encoder f, alpha=0.2 | 0.6162 | 0.4800 | 0.3171 | 0.7423 | 0.6529 |
-------------------------------------------------------------------------------------- | ------------- | ------------ | ------------ | ------------ | -------------- |
a. Hybrid DR c + SPLADE c + Cross-encoder f, alpha=0.1 + GPT-4 judger rerank top20 avg num_api_calls: 20, avg prompt tokens: 28105.44, avg generate tokens: 28.86 |
0.6591 | 0.5680 | 0.3241 | 0.7795 | 0.6529 |
GPT-4 judger rerank top20: avg num_api_calls: 20, avg prompt tokens: 27125.3, avg generate tokens: 35.725, $32.64