Add VisRes Bench benchmark (CVPR 2026) by dunghuynhandy · Pull Request #1245 · EvolvingLMMs-Lab/lmms-eval

dunghuynhandy · 2026-03-09T14:45:44Z

Pull Request: Add VisRes-Bench (CVPR 2026)

VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs

Summary

Add VisRes-Bench (arXiv:2512.21194, CVPR 2026) as an lmms-eval task group using the Hugging Face dataset tiiuae/visres_bench.
Expose all 27 dataset configs as tasks with a shared default template, optional guided vs generic question column via lmms_eval_specific_kwargs, and subgroups for level 1 (8 tasks, no random_sampling), level 2 (12 tasks), and level 3 (5 tasks).
Add a README with run instructions and citation.

In scope

New task folder lmms_eval/tasks/visres_bench/ with:
- utils.py: visres_bench_doc_to_text (supports question_column: guided_question / generic_question), visres_bench_doc_to_visual, vp_process_results
- _default_template_visres_bench_yaml: dataset path, doc_to_text/doc_to_visual/doc_to_target, metrics, lmms_eval_specific_kwargs for default and generic formats
- Group YAMLs: _visres_bench.yaml (all 27), _visres_bench_level_1.yaml (8), _visres_bench_level_2.yaml (12), _visres_bench_level_3.yaml (5)
- 27 task YAMLs named visres_bench_level_1_global_occlusion_50, etc., each includeing the default template and setting task and dataset_name
- README.md: how to run (all tasks, level 1/2/3, single task), question type (guided vs generic), summary table, and citation

Out of scope

No changes to existing tasks, models, or eval harness outside lmms_eval/tasks/visres_bench/.
No new dependencies beyond existing HF datasets usage.

Validation

--tasks visres_bench_level_1_global_occlusion_50 | single config | dataset loads, doc_to_text/doc_to_visual/process_results run | pass
--tasks visres_bench_level_1 | 8 tasks | all configs load from tiiuae/visres_bench | pass
--tasks visres_bench | 27 tasks | full group runs | pass

Risk / Compatibility

Additive only: new task group and files under lmms_eval/tasks/visres_bench/. No changes to existing task or model APIs.

Type of Change

New benchmark/task
Documentation update

dunghuynhandy added 3 commits March 9, 2026 18:10

add visres tasks

3c43ef9

add citation

6f3ad30

fix pre-commit

decda08

Luodian approved these changes Mar 10, 2026

View reviewed changes

Luodian merged commit eb2cffe into EvolvingLMMs-Lab:main Mar 10, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VisRes Bench benchmark (CVPR 2026)#1245

Add VisRes Bench benchmark (CVPR 2026)#1245
Luodian merged 3 commits intoEvolvingLMMs-Lab:mainfrom
dunghuynhandy:main

dunghuynhandy commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dunghuynhandy commented Mar 9, 2026

Pull Request: Add VisRes-Bench (CVPR 2026)

Summary

In scope

Out of scope

Validation

Risk / Compatibility

Type of Change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants