feat: SGLang refactor, distributed eval fixes, and cache simplification#1253
Merged
feat: SGLang refactor, distributed eval fixes, and cache simplification#1253
Conversation
Merge dummy_video_reader into a single dummy model that serves both use cases: - Default mode: instant no-op responses for dataset hydration and task smoke tests - Video-bench mode (read_bytes/decode_num_frames > 0): full IO/decode latency tracking The old name dummy_video_reader is kept as a MODEL_ALIASES alias for backward compat.
… inputs SGLang's Engine runs its own Qwen3-VL processor internally. When lmms-eval pre-tokenized inputs with the HF processor and passed the expanded input_ids to SGLang, pad tokens were expanded twice, causing IndexError on image inputs and potential failures on video inputs. - Image path: pass prompt text directly to Engine.generate() instead of pre-tokenized input_ids, letting SGLang handle tokenization end-to-end - Video path: pass prompt text + video_data to Engine.generate() using SGLang's native video support instead of pre-tokenizing and swapping video tokens to image tokens - Fix tools check: use truthy check instead of 'is not None' so empty list from disabled MCP does not trigger tool-handling code paths - Fix tools param: pass tools=None instead of tools=[] to apply_chat_template to avoid unexpected preprocessing - Lazy-import MCP deps: avoid ImportError at module load when mcp package is not installed - Broaden optional metric imports: catch Exception instead of ImportError so numpy/spacy binary incompatibilities do not crash metric aggregation for unrelated tasks
SGLang model wrapper: - Remove qwen_vl_utils dependency from generic wrapper - Pass per-request image_data instead of flattening across batch - Initialize _config with AutoConfig instead of returning processor - Patch torchvision read_video missing video_fps fallback - Pass flat image list to Engine.generate instead of nested lists Distributed eval: - Use global rank in model wrappers for correct TP+DP dispatch - Add Slurm-aware progress reporting for batch jobs - Redirect HF datasets cache to local scratch on remote FS Response cache: - Simplify to single create/finalize API - Context-length and batch-size tuning for thinking models Tests: - Expanded cache tests for simplified API - Filelock cross-class singleton regression test - Task dataset cache redirect test Deps: - Add torchcodec to pyproject.toml
This was referenced Mar 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1247. Refactors the SGLang model wrapper, fixes distributed eval (TP+DP), simplifies the response cache, and hardens dataset loading for remote filesystems.
SGLang model wrapper refactor
qwen_vl_utilsdependency from generic wrapper — no longer needed for non-Qwen modelsimage_datatoEngine.generate()instead of flattening across the batch_configwithAutoConfiginstead of returning the processor objecttorchvision.io.read_videofallback whenvideo_fpsmetadata is missingEngine.generate()instead of nested listsDistributed eval (TP+DP)
lmms_eval/models/model_utils/progress.py)Response cache simplification
create/finalizeAPI (removes segment/seal complexity)Tests
Deps
torchcodectopyproject.tomlvideo extrasTest plan
Depends on #1247