Skip to content

Feat/ocrbench omnidocbench lang subset#1250

Merged
Luodian merged 2 commits intoEvolvingLMMs-Lab:mainfrom
MaxwellJryao:feat/ocrbench-omnidocbench-lang-subset
Mar 11, 2026
Merged

Feat/ocrbench omnidocbench lang subset#1250
Luodian merged 2 commits intoEvolvingLMMs-Lab:mainfrom
MaxwellJryao:feat/ocrbench-omnidocbench-lang-subset

Conversation

@MaxwellJryao
Copy link
Copy Markdown
Contributor

Summary

  • Add separate Chinese/English subset score metrics for OCRBench_v2 (ocrbench_v2_accuracy_en, ocrbench_v2_accuracy_cn)
  • Add separate Chinese/English/Mixed subset score metrics for OmniDocBench (omnidocbench_{exact_match,nld_score}_{en,cn,mixed})
  • Fix spotting_evaluation crash when result["method"] is missing or not a dict

In scope

  • New per-language aggregation functions and YAML metric entries for both benchmarks
  • Refactor OCRBench_v2 bucket-filling logic into shared _fill_score_buckets helper
  • Change OCRBench_v2 overall score from equal-weight EN/CN average to sample-count-weighted average
  • Add _detect_document_language for OmniDocBench based on page_info.page_attribute.language (fallback: element-level majority vote)

Out of scope

  • No changes to scoring logic (how individual samples are evaluated)
  • No changes to dataset loading, prompts, or generation kwargs

Validation

Risk / Compatibility

  • OCRBench_v2 overall score (ocrbench_v2_accuracy) now uses sample-count-weighted averaging instead of equal-weight EN/CN averaging; results will
    differ slightly on the full dataset (7400 EN vs 2600 CN)
  • New metrics are additive; existing metric keys and values (except the weighting change above) are unchanged

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

@Luodian Luodian merged commit 540724a into EvolvingLMMs-Lab:main Mar 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants