Skip to content

Commit a206186

Browse files
committed
Update
1 parent a5e0672 commit a206186

File tree

1 file changed

+16
-1
lines changed

1 file changed

+16
-1
lines changed

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,22 @@ This is *NOT* intended to be a "framework" or "library" - it is intended to show
1414

1515
For an in-depth walkthrough of what's in this codebase, see this [blog post](https://pytorch.org/blog/accelerating-generative-ai-2/).
1616

17-
We supported [Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) which is a high-quality sparse mixture of experts (MoE) model, see [this page](./mixtral-moe) for more details.
17+
## Supported Models
18+
19+
### LLaMA family
20+
Please check the rest of this page about benchmark of LLaMA family models.
21+
22+
### Mixtral 8x7B
23+
We also supported [Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts/) which is a high-quality sparse mixture of experts (MoE) model, the average token generation rates are:
24+
25+
| | 1 GPU | 2 GPU | 4 GPU | 8 GPU |
26+
|------------------|---------|-----------|--------|------------|
27+
|baseline(bfloat16)| OOM | 78.75 | 118.23 | 203.69 |
28+
| int8 | 56.04 | 99.91 | 149.53 | 218.48 |
29+
30+
Note that the benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh topology. Note that all benchmarks are run at *batch size=1*, making the reported tokens/s numbers equivalent to "tokens/s/user". In addition, they are run with a very small prompt length (just 5 tokens).
31+
32+
For more details about Mixtral 8x7B, please check [this page](./mixtral-moe).
1833

1934
## Community
2035

0 commit comments

Comments
 (0)