Skip to content

fix distributed evaluation on empty task shards#1233

Merged
Luodian merged 2 commits intoEvolvingLMMs-Lab:mainfrom
Luodian:codex/evaluator-empty-shards
Mar 7, 2026
Merged

fix distributed evaluation on empty task shards#1233
Luodian merged 2 commits intoEvolvingLMMs-Lab:mainfrom
Luodian:codex/evaluator-empty-shards

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Mar 7, 2026

Summary

  • Fix distributed evaluation when a rank receives zero documents after task sharding.
  • Keep request construction and synchronization aligned by injecting a padding request for empty shards.
  • Scope the change to task request building and evaluator synchronization.

In scope

  • Update lmms_eval/api/task.py to preserve distributed synchronization for empty shards.
  • Update lmms_eval/evaluator.py to keep request/filter coordination consistent across ranks.
  • Include the formatter adjustments required by the existing lint rules.

Out of scope

  • vLLM backend dispatch changes.
  • Benchmark-specific scoring or prompt changes.

Validation

  • uv run --with pytest python -m pytest test/eval/test_construct_requests.py -q | sample size: N=23 tests | key metrics: 23 passed | result: pass
  • uv run python - <<'PY' import lmms_eval.evaluator as evaluator print(hasattr(evaluator, 'evaluate')) PY | sample size: N=1 smoke check | key metrics: evaluate=True | result: pass
  • uv run pre-commit run --all-files | sample size: N=all tracked files | key metrics: black, isort passed | result: pass

Risk / Compatibility

  • Moderate behavior change: distributed runs with empty shards now stay alive instead of diverging or hanging.
  • The new padding path only applies when a rank has no documents, so normal single-rank and balanced runs are unaffected.

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

Luodian and others added 2 commits March 7, 2026 14:26
Inject a padding request when a rank receives zero docs and align request/filter synchronization across ranks so TP+DP jobs with limit<=world_size no longer crash or hang.
@Luodian Luodian marked this pull request as ready for review March 7, 2026 06:31
@Luodian Luodian merged commit e867638 into EvolvingLMMs-Lab:main Mar 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant