feat: SGLang refactor, distributed eval fixes, and cache simplification by Luodian · Pull Request #1253 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-03-15T06:50:24Z

Summary

Follow-up to #1247. Refactors the SGLang model wrapper, fixes distributed eval (TP+DP), simplifies the response cache, and hardens dataset loading for remote filesystems.

SGLang model wrapper refactor

Remove qwen_vl_utils dependency from generic wrapper — no longer needed for non-Qwen models
Pass per-request image_data to Engine.generate() instead of flattening across the batch
Initialize _config with AutoConfig instead of returning the processor object
Patch torchvision.io.read_video fallback when video_fps metadata is missing
Pass flat image list to Engine.generate() instead of nested lists

Distributed eval (TP+DP)

Use global rank in model wrappers for correct TP+DP data dispatch
Add Slurm-aware progress reporting for batch jobs (lmms_eval/models/model_utils/progress.py)
Redirect HF datasets cache to local scratch directory on remote filesystems to avoid NFS file-lock contention

Response cache simplification

Simplify cache lifecycle to single create / finalize API (removes segment/seal complexity)
Context-length and batch-size tuning for long-context thinking models

Tests

Expanded response cache tests for simplified API
Filelock cross-class singleton regression test
Task dataset cache redirect test

Deps

Add torchcodec to pyproject.toml video extras

Test plan

SGLang wrapper tested on Qwen3.5-4B with TP=2 DP=16 (4-node FluidStack)
Cache simplification verified with 31-task eval run
Filelock patch regression test passes
Dataset cache redirect tested on remote FS (NFS/VAST)

Depends on #1247

Merge dummy_video_reader into a single dummy model that serves both use cases: - Default mode: instant no-op responses for dataset hydration and task smoke tests - Video-bench mode (read_bytes/decode_num_frames > 0): full IO/decode latency tracking The old name dummy_video_reader is kept as a MODEL_ALIASES alias for backward compat.

… inputs SGLang's Engine runs its own Qwen3-VL processor internally. When lmms-eval pre-tokenized inputs with the HF processor and passed the expanded input_ids to SGLang, pad tokens were expanded twice, causing IndexError on image inputs and potential failures on video inputs. - Image path: pass prompt text directly to Engine.generate() instead of pre-tokenized input_ids, letting SGLang handle tokenization end-to-end - Video path: pass prompt text + video_data to Engine.generate() using SGLang's native video support instead of pre-tokenizing and swapping video tokens to image tokens - Fix tools check: use truthy check instead of 'is not None' so empty list from disabled MCP does not trigger tool-handling code paths - Fix tools param: pass tools=None instead of tools=[] to apply_chat_template to avoid unexpected preprocessing - Lazy-import MCP deps: avoid ImportError at module load when mcp package is not installed - Broaden optional metric imports: catch Exception instead of ImportError so numpy/spacy binary incompatibilities do not crash metric aggregation for unrelated tasks

SGLang model wrapper: - Remove qwen_vl_utils dependency from generic wrapper - Pass per-request image_data instead of flattening across batch - Initialize _config with AutoConfig instead of returning processor - Patch torchvision read_video missing video_fps fallback - Pass flat image list to Engine.generate instead of nested lists Distributed eval: - Use global rank in model wrappers for correct TP+DP dispatch - Add Slurm-aware progress reporting for batch jobs - Redirect HF datasets cache to local scratch on remote FS Response cache: - Simplify to single create/finalize API - Context-length and batch-size tuning for thinking models Tests: - Expanded cache tests for simplified API - Filelock cross-class singleton regression test - Task dataset cache redirect test Deps: - Add torchcodec to pyproject.toml

Luodian and others added 8 commits March 15, 2026 14:49

fix: land layered cache support on main worktree

7cd4974

fix: stabilize dataset loading and mmmu pro prompts

a67f1f2

fix: add eval batch watchdog heartbeats

201f842

feat: promote sealed cache segments during eval

cd3da8a

style: auto-fix lint (black + isort)

0b0c78d

This was referenced Mar 15, 2026

fix: cross-class filelock deadlock in datasets loading #1247

Closed

[Fix] ImportError: cannot import name 'MCPClient' from 'lmms_eval.mcp' #1251

Closed

Luodian merged commit 9e69834 into main Mar 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SGLang refactor, distributed eval fixes, and cache simplification#1253

feat: SGLang refactor, distributed eval fixes, and cache simplification#1253
Luodian merged 8 commits intomainfrom
brianli/sglang-distributed-eval

Luodian commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luodian commented Mar 15, 2026

Summary

SGLang model wrapper refactor

Distributed eval (TP+DP)

Response cache simplification

Tests

Deps

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant