Skip to content

Commit

Permalink
[skip ci] Update perf and latest features for llm models (Mar 10) (#1…
Browse files Browse the repository at this point in the history
  • Loading branch information
skhorasganiTT authored Mar 10, 2025
1 parent 0c05a13 commit 7c1bd85
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 20 deletions.
33 changes: 16 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,24 @@

| Model | Batch | Hardware | ttft (ms) | t/s/u | Target<br>t/s/u | t/s | TT-Metalium Release | vLLM Tenstorrent Repo Release |
|---------------------------------------------------------------|-------|----------------------------------------------------------|-----------|-------|-----------------|--------|---------------------------------------------------|---------------------------------------------------------------------------------------------------|
| [QwQ 32B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 133 | 25.2 | | 464.0 | [main](https://github.com/tenstorrent/tt-metal/) | [9ac3783](https://github.com/tenstorrent/vllm/tree/9ac3783d5e3a4547f879f2cdadaab8571047a0a8) |
| [DeepSeek R1 Distill Llama 3.3 70B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 180 | 15.2 | 20 | 486.4 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | [9ac3783](https://github.com/tenstorrent/vllm/tree/9ac3783d5e3a4547f879f2cdadaab8571047a0a8) |
| [Llama 3.1 70B (TP=8)](./models/demos/t3000/llama3_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 190 | 15.1 | 20 | 483.2 | [v0.54.0-rc2](https://github.com/tenstorrent/tt-metal/tree/v0.54.0-rc2) | [9531611](https://github.com/tenstorrent/vllm/tree/953161188c50f10da95a88ab305e23977ebd3750) |
| [Llama 3.2 11B Vision (TP=2)](./models/demos/llama3) | 16 | [n300](https://tenstorrent.com/hardware/wormhole) | 2550 | 15.8 | 17 | 252.8 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | [b9564bf](https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7) |
| [Qwen 2.5 7B (TP=2)](./models/demos/llama3) | 32 | [n300](https://tenstorrent.com/hardware/wormhole) | 126 | 32.5 | 38 | 1040.0 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | [9ac3783](https://github.com/tenstorrent/vllm/tree/9ac3783d5e3a4547f879f2cdadaab8571047a0a8) |
| [Falcon 7B](./models/demos/wormhole/falcon7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 71 | 18.1 | 26 | 579.2 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | |
| [Mistral 7B](./models/demos/wormhole/mistral7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | | 9.9 | 25 | 316.8 | [v0.51.0-rc28](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc28) | |
| [Mamba 2.8B](./models/demos/wormhole/mamba) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 48 | 12.3 | 41 | 393.6 | [v0.51.0-rc26](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc26) | |
| [Llama 3.1 8B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 168 | 24.0 | 23 | 768.0 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | [b9564bf](https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7) |
| [Llama 3.2 1B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 56 | 59.4 | 160 | 1900.8 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | [b9564bf](https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7) |
| [Llama 3.2 3B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 97 | 36.5 | 60 | 1168.0 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | [b9564bf](https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7) |
| [Falcon 7B (DP=8)](./models/demos/t3000/falcon7b) | 256 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 88 | 15.5 | 26 | 3968.0 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | |
| [Falcon 40B (TP=8)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | 169.6 | [v0.55.0-rc20](https://github.com/tenstorrent/tt-metal/tree/v0.55.0-rc20) | |
| [Mixtral 8x7B (TP=8)](./models/demos/t3000/mixtral8x7b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 227 | 14.9 | 33 | 476.8 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | |
| [Qwen 2.5 72B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 333 | 14.5 | 20 | 464.0 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | [9ac3783](https://github.com/tenstorrent/vllm/tree/9ac3783d5e3a4547f879f2cdadaab8571047a0a8) |
| [QwQ 32B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 133 | 25.2 | | 464.0 | [v0.56.0-rc51](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc51) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [DeepSeek R1 Distill Llama 3.3 70B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 180 | 15.2 | 20 | 486.4 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Llama 3.1 70B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 180 | 15.2 | 20 | 486.4 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Llama 3.2 11B Vision (TP=2)](./models/demos/llama3) | 16 | [n300](https://tenstorrent.com/hardware/wormhole) | 2550 | 15.8 | 17 | 252.8 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Qwen 2.5 7B (TP=2)](./models/demos/llama3) | 32 | [n300](https://tenstorrent.com/hardware/wormhole) | 126 | 32.5 | 38 | 1040.0 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Qwen 2.5 72B (TP=8)](./models/demos/llama3) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 333 | 14.5 | 20 | 464.0 | [v0.56.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc33) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Falcon 7B](./models/demos/wormhole/falcon7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 70 | 18.3 | 26 | 585.6 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | |
| [Falcon 7B (DP=8)](./models/demos/t3000/falcon7b) | 256 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 88 | 15.5 | 26 | 3968.0 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | |
| [Falcon 7B (DP=32)](./models/demos/tg/falcon7b) | 1024 | [Galaxy](https://tenstorrent.com/hardware/galaxy) | 223 | 4.8 | 26 | 4915.2 | [v0.56.0-rc6](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc6) | |
| [Falcon 40B (TP=8)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | 169.6 | [v0.56.0-rc45](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc45) | |
| [Llama 3.1 8B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 141 | 24.6 | 23 | 787.2 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Llama 3.2 1B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 50 | 67.6 | 160 | 2163.2 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Llama 3.2 3B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 78 | 43.5 | 60 | 1392.0 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | [e2e0002](https://github.com/tenstorrent/vllm/tree/e2e0002ac7dcbc5793983c0f967474d4dcab21f8) |
| [Mamba 2.8B](./models/demos/wormhole/mamba) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 48 | 12.3 | 41 | 393.6 | [v0.51.0-rc26](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc26) | |
| [Mistral 7B](./models/demos/wormhole/mistral7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | | 9.9 | 25 | 316.8 | [v0.51.0-rc28](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc28) | |
| [Mixtral 8x7B (TP=8)](./models/demos/t3000/mixtral8x7b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 227 | 15.2 | 33 | 486.4 | [v0.56.0-rc47](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc47) | |


> **Last Update:** March 6, 2025
> **Last Update:** March 10, 2025
>
> **Notes:**
>
Expand Down
11 changes: 8 additions & 3 deletions models/MODEL_UPDATES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,18 @@
>
> Please refer to the front-page [README](../README.md) for the latest verified release for each model.
## March 10, 2025

### [QwQ-32B](demos/llama3)
- Added support for QwQ-32B on QuietBox.

## February 24, 2025

### [DeepSeek R1 Distill Llama 3.3 70B](demos/llama3)
- Added support for DeepSeek R1 Distill Llama 3.3 70B on T3000.
- Added support for DeepSeek R1 Distill Llama 3.3 70B on QuietBox.

### [Qwen 2.5](demos/llama3)
- Added support for Qwen2.5-7B on N300 and Qwen2.5-72B on T3000.
- Added support for Qwen2.5-7B on N300 and Qwen2.5-72B on QuietBox.

### [Llama 3.1/3.2](demos/llama3)
> **Note:** This feature is available as of release [v0.56.0-rc37](https://github.com/tenstorrent/tt-metal/tree/v0.56.0-rc37)
Expand All @@ -29,7 +34,7 @@
## January 13, 2025

### [Llama 3.1/3.2](demos/llama3)
- Integrated Llama3 models (1B/3B/8B/11B/70B) into [vLLM fork](https://github.com/tenstorrent/vllm/tree/dev/tt_metal) for all compatible Tenstorrent devices (N150/N300/T3000/Galaxy).
- Integrated Llama3 models (1B/3B/8B/11B/70B) into [vLLM fork](https://github.com/tenstorrent/vllm/tree/dev/tt_metal) for all compatible Tenstorrent devices (N150/N300/QuietBox/Galaxy).
- Enabled prefill with the maximum context length (131072) when running the Llama3 text models on smaller devices (N150/N300) via chunked prefill.

## December 16, 2024
Expand Down

0 comments on commit 7c1bd85

Please sign in to comment.