-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[ROCm][CI] Fixes tests for pytorch nightly and python only builds #28979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Fixed setup.py logging format (G001) - Added ROCm support for precompiled wheels - Improved Dockerfile.rocm test stage with hf_transfer and v1 package - Fixed pytorch_nightly_dependency.sh to detect ROCm version correctly - Added source code directory for python_only_compile.sh test Signed-off-by: Andreas Karatzas <[email protected]>
Signed-off-by: Andreas Karatzas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces fixes for PyTorch nightly and Python-only build tests on the ROCm platform, and adds torchaudio to the ROCm Docker image. The changes in setup.py and the test scripts are well-implemented, improving the robustness of the build process. My review focuses on optimizing the Dockerfiles. I've suggested combining several RUN instructions to reduce image layers, which is a best practice that improves build times and reduces image size. I've also recommended adding apt-get update and cache cleanup to an apt-get install command to ensure correctness and efficiency.
| RUN mkdir src | ||
| RUN mv vllm src/vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| && python3 --version && python3 -m pip --version | ||
|
|
||
| RUN pip install -U packaging 'cmake<4' ninja wheel 'setuptools<80' pybind11 Cython | ||
| RUN apt-get install -y libjpeg-dev libsox-dev libsox-fmt-all sox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness and to keep the Docker image size minimal, it's crucial to run apt-get update before apt-get install and to clean up the apt cache in the same RUN layer. This prevents issues with stale package lists and removes unnecessary files.
RUN apt-get update && apt-get install -y libjpeg-dev libsox-dev libsox-fmt-all sox && rm -rf /var/lib/apt/lists/*
| RUN git clone ${PYTORCH_AUDIO_REPO} audio | ||
| RUN cd audio && git checkout ${PYTORCH_AUDIO_BRANCH} \ | ||
| && git submodule update --init --recursive \ | ||
| && pip install -r requirements.txt \ | ||
| && python3 setup.py bdist_wheel --dist-dir=dist \ | ||
| && pip install dist/*.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the number of Docker image layers and improve build efficiency, it's recommended to combine the git clone and the subsequent build commands for torchaudio into a single RUN instruction.
RUN git clone ${PYTORCH_AUDIO_REPO} audio && cd audio \
&& git checkout ${PYTORCH_AUDIO_BRANCH} \
&& git submodule update --init --recursive \
&& pip install -r requirements.txt \
&& python3 setup.py bdist_wheel --dist-dir=dist \
&& pip install dist/*.whl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # If using precompiled, extract and patch package_data (in advance of setup) | ||
| if envs.VLLM_USE_PRECOMPILED: | ||
| assert _is_cuda(), "VLLM_USE_PRECOMPILED is only supported for CUDA builds" | ||
| assert _is_cuda() or _is_hip(), ( | ||
| "VLLM_USE_PRECOMPILED is only supported for CUDA or ROCm builds." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing precompiled mode on ROCm installs CUDA wheel
The precompiled path now explicitly allows _is_hip() (ROCm) builds, but the precompiled wheel download below still always targets the CUDA binary (wheels.vllm.ai/...vllm-1.0.0.dev-cp38-abi3-<arch>.whl, no ROCm variant). On ROCm agents where VLLM_USE_PRECOMPILED=1 is used (e.g., python_only_compile.sh in the AMD pipeline), this will install CUDA .so files without the CUDA runtime present, causing import-time failures instead of a clean skip. Either keep the CUDA-only guard or fetch a ROCm-specific wheel before enabling this path on ROCm.
Useful? React with 👍 / 👎.
This PR fixes tests for labels:
Pytorch Nightly Dependency Override CheckPython-only Installation TestIt also includes
torchaudiopackage into Dockerfile.