Skip to content

Commit f18f426

Browse files
tijmenNanoCode012
andauthored
Docs for AMD-based HPC systems (#1891)
* Add documentation for installing on AMD-based HPC systems. * Accept suggestion to add note about deepspeed Co-authored-by: NanoCode012 <[email protected]> * Update _quarto.yml with amd_hpc doc --------- Co-authored-by: Tijmen de Haan <[email protected]> Co-authored-by: NanoCode012 <[email protected]>
1 parent dca1fe4 commit f18f426

File tree

2 files changed

+109
-0
lines changed

2 files changed

+109
-0
lines changed

_quarto.yml

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ website:
3737
- docs/mac.qmd
3838
- docs/multi-node.qmd
3939
- docs/unsloth.qmd
40+
- docs/amd_hpc.qmd
4041
- section: "Dataset Formats"
4142
contents: docs/dataset-formats/*
4243
- section: "Reference"

docs/amd_hpc.qmd

+108
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
title: Training with AMD GPUs on HPC Systems
3+
description: A comprehensive guide for using Axolotl on distributed systems with AMD GPUs
4+
---
5+
6+
This guide provides step-by-step instructions for installing and configuring Axolotl on a High-Performance Computing (HPC) environment equipped with AMD GPUs.
7+
8+
## Setup
9+
10+
### 1. Install Python
11+
12+
We recommend using Miniforge, a minimal conda-based Python distribution:
13+
14+
```bash
15+
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
16+
bash Miniforge3-$(uname)-$(uname -m).sh
17+
```
18+
19+
### 2. Configure Python Environment
20+
Add Python to your PATH and ensure it's available at login:
21+
22+
```bash
23+
echo 'export PATH=~/miniforge3/bin:$PATH' >> ~/.bashrc
24+
echo 'if [ -f ~/.bashrc ]; then . ~/.bashrc; fi' >> ~/.bash_profile
25+
```
26+
27+
### 3. Load AMD GPU Software
28+
29+
Load the ROCm module:
30+
31+
```bash
32+
module load rocm/5.7.1
33+
```
34+
35+
Note: The specific module name and version may vary depending on your HPC system. Consult your system documentation for the correct module name.
36+
37+
### 4. Install PyTorch
38+
39+
Install PyTorch with ROCm support:
40+
41+
```bash
42+
pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7 --force-reinstall
43+
```
44+
45+
### 5. Install Flash Attention
46+
47+
Clone and install the Flash Attention repository:
48+
49+
```bash
50+
git clone --recursive https://github.com/ROCmSoftwarePlatform/flash-attention.git
51+
export GPU_ARCHS="gfx90a"
52+
cd flash-attention
53+
export PYTHON_SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
54+
patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch
55+
pip install .
56+
```
57+
58+
### 6. Install Axolotl
59+
60+
Clone and install Axolotl:
61+
62+
```bash
63+
git clone https://github.com/axolotl-ai-cloud/axolotl
64+
cd axolotl
65+
pip install packaging ninja
66+
pip install -e .
67+
```
68+
69+
### 7. Apply xformers Workaround
70+
71+
xformers appears to be incompatible with ROCm. Apply the following workarounds:
72+
- Edit $HOME/packages/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py modifying the code to always return `False` for SwiGLU availability from xformers.
73+
- Edit $HOME/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py replacing the "SwiGLU" function with a pass statement.
74+
75+
### 8. Prepare Job Submission Script
76+
77+
Create a script for job submission using your HPC's particular software (e.g. Slurm, PBS). Include necessary environment setup and the command to run Axolotl training. If the compute node(s) do(es) not have internet access, it is recommended to include
78+
79+
```bash
80+
export TRANSFORMERS_OFFLINE=1
81+
export HF_DATASETS_OFFLINE=1
82+
```
83+
84+
### 9. Download Base Model
85+
86+
Download a base model using the Hugging Face CLI:
87+
88+
```bash
89+
huggingface-cli download meta-llama/Meta-Llama-3.1-8B --local-dir ~/hfdata/llama3.1-8B
90+
```
91+
92+
### 10. Create Axolotl Configuration
93+
94+
Create an Axolotl configuration file (YAML format) tailored to your specific training requirements and dataset. Use FSDP for multi-node training.
95+
96+
Note: Deepspeed did not work at the time of testing. However, if anyone managed to get it working, please let us know.
97+
98+
### 11. Preprocess Data
99+
100+
Run preprocessing on the login node:
101+
102+
```bash
103+
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess /path/to/your/config.yaml
104+
```
105+
106+
### 12. Train
107+
108+
You are now ready to submit your previously prepared job script. 🚂

0 commit comments

Comments
 (0)