-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
I noticed that the SmolInstruct overall score for Intern-S1-Pro on the leaderboard is 74.8.
After running the evaluation using the configuration file at
https://github.com/open-compass/opencompass/blob/main/examples/eval_intern_s1_pro.py
, the resulting SmolInstruct output only contains evaluation scores for individual subsets. Could you clarify how the overall score shown in the leaderboard is computed?
| dataset | version | metric | mode | qwen3 |
|---|---|---|---|---|
| SmolInstruct | - | - | - | - |
| NC-I2F-0shot-instruct | d2fb04 | score | gen | 86.33 |
| NC-I2S-0shot-instruct | ead200 | score | gen | 1.17 |
| NC-S2F-0shot-instruct | 989e6e | score | gen | 71.40 |
| NC-S2I-0shot-instruct | fb7430 | score | gen | 5.68 |
| PP-ESOL-0shot-instruct | 9cf92f | score | gen | 0.76 |
| PP-Lipo-0shot-instruct | 6af51f | score | gen | 0.81 |
| PP-BBBP-0shot-instruct | 5376b4 | accuracy | gen | 83.76 |
| PP-ClinTox-0shot-instruct | b94c41 | accuracy | gen | 40.97 |
| PP-HIV-0shot-instruct | fcbfe8 | accuracy | gen | 55.86 |
| PP-SIDER-0shot-instruct | b5e48e | accuracy | gen | 65.80 |
| MC-0shot-instruct | 35dbf4 | score | gen | 0.21 |
| MG-0shot-instruct | 30c630 | score | gen | 58.93 |
| FS-0shot-instruct | fe206a | score | gen | 72.48 |
| RS-0shot-instruct | bafb38 | score | gen | 44.76 |
| - | - | - | - |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels