Skip to content

Commit 89b2502

Browse files
yanboliangChillee
andauthored
Add Mixtral-8x7B in sub-folder (#105)
* Add Mixtral-8x7B in sub-folder * Remove unused logics * Update README.md --------- Co-authored-by: Horace He <[email protected]>
1 parent a372835 commit 89b2502

File tree

7 files changed

+1799
-0
lines changed

7 files changed

+1799
-0
lines changed

mixtral-moe/README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Mixtral 8x7B
2+
[Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) is a high-quality sparse mixture of experts (MoE) model that matches or beats GPT3.5 on most benchmarks. This repro is a simple and efficient PyTorch native implementation of Mixtral 8x7B.
3+
4+
## Downloading Weights
5+
6+
```bash
7+
export MODEL_REPO=mistralai/Mixtral-8x7B-v0.1
8+
python scripts/download.py --repo_id $MODEL_REPO
9+
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/$MODEL_REPO
10+
```
11+
12+
## Benchmarks
13+
Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh topology. Note that all benchmarks are run at *batch size=1*, making the reported tokens/s numbers equivalent to "tokens/s/user". In addition, they are run with a very small prompt length (just 5 tokens).
14+
15+
| | 1 GPU | 2 GPU | 4 GPU | 8 GPU |
16+
|------------------|---------|-----------|--------|------------|
17+
|baseline(bfloat16)| OOM | 78.75 | 118.23 | 203.69 |
18+
| int8 | 56.04 | 99.91 | 149.53 | 218.48 |
19+
20+
21+
22+
## Generate Text
23+
24+
Model definition in `model.py`, generation code in `generate.py`.
25+
26+
```bash
27+
python generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "Hello, my name is"
28+
```
29+
30+
To squeeze out a little bit more performance, you can also compile the prefill with `--compile_prefill`. This will increase compilation times though.
31+
32+
## Quantization
33+
### Int8 Weight-Only Quantization
34+
To generate this version of the model
35+
```bash
36+
# Spits out model at checkpoints/$MODEL_REPO/model_int8.pth
37+
python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8
38+
```
39+
To run with int8, just pass the int8 checkpoint to generate.py.
40+
```bash
41+
python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth
42+
```
43+
44+
## Tensor Parallelism
45+
```bash
46+
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=8 generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model.pth
47+
```

0 commit comments

Comments
 (0)