Skip to content

feat(omnidocbench): add normalized Levenshtein distance metric#1246

Merged
Luodian merged 1 commit intoEvolvingLMMs-Lab:mainfrom
MaxwellJryao:feat/omnidocbench-nld-metric
Mar 10, 2026
Merged

feat(omnidocbench): add normalized Levenshtein distance metric#1246
Luodian merged 1 commit intoEvolvingLMMs-Lab:mainfrom
MaxwellJryao:feat/omnidocbench-nld-metric

Conversation

@MaxwellJryao
Copy link
Copy Markdown
Contributor

@MaxwellJryao MaxwellJryao commented Mar 10, 2026

Add omnidocbench_nld_score metric computed as (1 - NLD) * 100, following the Kimi K2.5 technical report scoring method. The existing exact_match metric is preserved alongside the new one.

Summary

  • Add omnidocbench_nld_score metric: (1 - normalized_levenshtein_distance) * 100, using the Levenshtein library
  • When multiple reference answers exist, take the best (max) score across all answers
  • Register the new metric in omnidocbench.yaml with aggregation: mean

In scope

  • lmms_eval/tasks/omnidocbench/utils.py: add _normalized_levenshtein_score() helper, update omnidocbench_process_results to return both
    omnidocbench_exact_match and omnidocbench_nld_score
  • lmms_eval/tasks/omnidocbench/omnidocbench.yaml: register omnidocbench_nld_score in metric_list

Out of scope

  • No changes to other tasks (charxiv, ocrbench, ocrbench_v2, etc.)
  • No changes to the existing omnidocbench_exact_match scoring logic
  • No new dependencies added (Levenshtein is already declared in pyproject.toml under [project.optional-dependencies].metrics)

Validation

  • python -m lmms_eval --model vllm --model_args model=Qwen/Qwen3-VL-8B-Instruct,tensor_parallel_size=2,data_parallel_size=4 --tasks omnidocbench
    --limit 4 --batch_size 4 | sample size: N=4 | key metrics: omnidocbench_exact_match, omnidocbench_nld_score both reported | result: pass

Risk / Compatibility

  • Non-breaking: existing omnidocbench_exact_match metric is unchanged; the new metric is purely additive
  • Results from prior runs remain valid and comparable

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

Add omnidocbench_nld_score metric computed as (1 - NLD) * 100,
following the Kimi K2.5 technical report scoring method. The
existing exact_match metric is preserved alongside the new one.
@MaxwellJryao MaxwellJryao force-pushed the feat/omnidocbench-nld-metric branch from f2b55c3 to b0130f7 Compare March 10, 2026 02:41
@Luodian Luodian merged commit 4650095 into EvolvingLMMs-Lab:main Mar 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants