-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Speed up macOS smoke test #28954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up macOS smoke test #28954
Conversation
Signed-off-by: Michael Goin <[email protected]>
|
Note Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
| "model": "Qwen/Qwen3-0.6B", |
The smoke test now launches the server with trl-internal-testing/tiny-random-LlamaForCausalLM, but the completion request still posts "model": "Qwen/Qwen3-0.6B". With OpenAI-compatible APIs, a request for an unloaded model returns an error, so this curl check will consistently fail even though the server is running, breaking the workflow on every run. The request should target the model that was actually started.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: Michael Goin <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Michael Goin <[email protected]>
|
Confirmed https://github.com/vllm-project/vllm/actions/runs/19479854049/job/55748976858 works fine manually The main issue is that it seems distributed init takes 15 minutes to finish I think the likely bottleneck is |
Purpose
Use a smaller model to fix timeout issues
Test Plan
Test Result
Confirmed https://github.com/vllm-project/vllm/actions/runs/19479854049/job/55748976858 works fine manually
The main issue is that it seems distributed init takes 15 minutes to finish
I think the likely bottleneck is
torch.distributed.new_group()inGroupCoordinator.__init__(). For world_size=1,initialize_model_parallel()creates 5 GroupCoordinator instances (TP, DCP, PP, DP, EP), each creating 2 groups (device + CPU), totaling 10 new_group() calls. Even for single-process groups, PyTorch may still perform slow initialization. The main optimization would be to skip or optimize group creation for single-process cases, but that's a larger changeEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.