Skip to content

feat: add COVER and WM-aBench video understanding benchmarks#1273

Open
Luodian wants to merge 1 commit intomainfrom
feat/cover-wm-abench
Open

feat: add COVER and WM-aBench video understanding benchmarks#1273
Luodian wants to merge 1 commit intomainfrom
feat/cover-wm-abench

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Mar 26, 2026

Summary

  • Add COVER (Counterfactual Video Reasoning, ACL Findings 2025) benchmark for causal video understanding
  • Add WM-aBench (World Models aBench) with 36+ task variants for comprehensive world model evaluation

Benchmarks

COVER

Tests causal understanding in videos via counterfactual question generation. Includes generate_qa.py for automatic QA pair generation from video annotations.

WM-aBench

Evaluates world model capabilities across multiple dimensions:

  • Spatial: relative position, occupancy, multiview
  • Motion: direction, speed, trajectory
  • Physical: mechanistic knowledge, compositionality
  • Temporal: extension, positioning, transitivity
  • Visual: attribute recognition (color, shape, material)
  • Counting: discrete counting, relative counting

Uses ManISkill, TDW, Physion, Habitat, and CARLA simulation environments.

Test plan

  • Verify task registration with lmms-eval --tasks list | grep -E "cover|wm_abench"
  • Run COVER with a video-capable model
  • Run a WM-aBench subset task
  • Confirm group yaml correctly aggregates all subtasks

Add two video understanding benchmarks:
- COVER: Counterfactual Video Reasoning (ACL Findings 2025) - tests
  causal understanding in videos via counterfactual question generation
- WM-aBench: World Models aBench with 36+ task variants covering spatial
  reasoning, motion understanding, object interactions, physical properties,
  temporal reasoning, and visual attributes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant