Skip to content

Add VisRes Bench benchmark (CVPR 2026)#1245

Merged
Luodian merged 3 commits intoEvolvingLMMs-Lab:mainfrom
dunghuynhandy:main
Mar 10, 2026
Merged

Add VisRes Bench benchmark (CVPR 2026)#1245
Luodian merged 3 commits intoEvolvingLMMs-Lab:mainfrom
dunghuynhandy:main

Conversation

@dunghuynhandy
Copy link
Copy Markdown
Contributor

Pull Request: Add VisRes-Bench (CVPR 2026)

VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs

Summary

  • Add VisRes-Bench (arXiv:2512.21194, CVPR 2026) as an lmms-eval task group using the Hugging Face dataset tiiuae/visres_bench.
  • Expose all 27 dataset configs as tasks with a shared default template, optional guided vs generic question column via lmms_eval_specific_kwargs, and subgroups for level 1 (8 tasks, no random_sampling), level 2 (12 tasks), and level 3 (5 tasks).
  • Add a README with run instructions and citation.

In scope

  • New task folder lmms_eval/tasks/visres_bench/ with:
    • utils.py: visres_bench_doc_to_text (supports question_column: guided_question / generic_question), visres_bench_doc_to_visual, vp_process_results
    • _default_template_visres_bench_yaml: dataset path, doc_to_text/doc_to_visual/doc_to_target, metrics, lmms_eval_specific_kwargs for default and generic formats
    • Group YAMLs: _visres_bench.yaml (all 27), _visres_bench_level_1.yaml (8), _visres_bench_level_2.yaml (12), _visres_bench_level_3.yaml (5)
    • 27 task YAMLs named visres_bench_level_1_global_occlusion_50, etc., each includeing the default template and setting task and dataset_name
    • README.md: how to run (all tasks, level 1/2/3, single task), question type (guided vs generic), summary table, and citation

Out of scope

  • No changes to existing tasks, models, or eval harness outside lmms_eval/tasks/visres_bench/.
  • No new dependencies beyond existing HF datasets usage.

Validation

  • --tasks visres_bench_level_1_global_occlusion_50 | single config | dataset loads, doc_to_text/doc_to_visual/process_results run | pass
  • --tasks visres_bench_level_1 | 8 tasks | all configs load from tiiuae/visres_bench | pass
  • --tasks visres_bench | 27 tasks | full group runs | pass

Risk / Compatibility

  • Additive only: new task group and files under lmms_eval/tasks/visres_bench/. No changes to existing task or model APIs.

Type of Change

  • New benchmark/task
  • Documentation update

@Luodian Luodian merged commit eb2cffe into EvolvingLMMs-Lab:main Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants