Skip to content

How is the SmolInstruct overall score calculated for Intern-S1-Pro? #36

@zhouyujin

Description

@zhouyujin

I noticed that the SmolInstruct overall score for Intern-S1-Pro on the leaderboard is 74.8.
After running the evaluation using the configuration file at
https://github.com/open-compass/opencompass/blob/main/examples/eval_intern_s1_pro.py
, the resulting SmolInstruct output only contains evaluation scores for individual subsets. Could you clarify how the overall score shown in the leaderboard is computed?

dataset version metric mode qwen3
SmolInstruct - - - -
NC-I2F-0shot-instruct d2fb04 score gen 86.33
NC-I2S-0shot-instruct ead200 score gen 1.17
NC-S2F-0shot-instruct 989e6e score gen 71.40
NC-S2I-0shot-instruct fb7430 score gen 5.68
PP-ESOL-0shot-instruct 9cf92f score gen 0.76
PP-Lipo-0shot-instruct 6af51f score gen 0.81
PP-BBBP-0shot-instruct 5376b4 accuracy gen 83.76
PP-ClinTox-0shot-instruct b94c41 accuracy gen 40.97
PP-HIV-0shot-instruct fcbfe8 accuracy gen 55.86
PP-SIDER-0shot-instruct b5e48e accuracy gen 65.80
MC-0shot-instruct 35dbf4 score gen 0.21
MG-0shot-instruct 30c630 score gen 58.93
FS-0shot-instruct fe206a score gen 72.48
RS-0shot-instruct bafb38 score gen 44.76
- - - -

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions