Llama 3.1 8B OOM on n150 when running vLLM #65

tstescoTT · 2025-03-04T20:20:48Z

The HF repo for Llama 3.1 8B has been renamed to: meta-llama/Llama-3.1-8B-Instruct

The old name: meta-llama/Meta-Llama-3.1-8B-Instruct redirects to the new repo name.

It looks like in

Line 240 in 9ac3783

if ("meta-llama/Meta-Llama-3.1-8B" in self.model_config.model and

the old model name is hardcoded and this causes an OOM when running on physical n150 or using MESH_DEVICE=N150

INFO 03-04 11:14:14 tt_executor.py:67] # TT blocks: 2068, # CPU blocks: 0
INFO 03-04 11:14:14 tt_worker.py:67] Allocating kv caches
Allocating TT kv caches for each layer:  56%|██████████████████████████████████████████████████████████████████████████████████████▋                                                                   | 18/32 [00:10<00:04,  3.45it/s]                 Always | FATAL    | Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ /tt-metal/tt_metal/impl/allocator/bank_manager.cpp:131: tt::exception
info:
Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
backtrace:

I found that I can get it to work with

python examples/server_example_tt.py --model meta-llama/Meta-Llama-3.1-8B-Instruct

but using the alt (newer format) HF model name repros the issue:

python examples/server_example_tt.py --model meta-llama/Llama-3.1-8B-Instruct

Recommendation for quick fix is to drop meta-llama/Meta- from the model check so it matches both naming versions as well as all instruct versions.

The text was updated successfully, but these errors were encountered:

tstescoTT · 2025-03-04T20:22:25Z

Linked issue: tenstorrent/tt-inference-server#104

…max seq len to avoid OOM on n150 in #65

…q-len handle new llama 3.1 8B HF repo name and old in parameter setting max seq len to avoid OOM on n150 in #65

tstescoTT added the bug Something isn't working label Mar 4, 2025

tstescoTT self-assigned this Mar 5, 2025

tstescoTT added a commit that referenced this issue Mar 5, 2025

handle new llama 3.1 8B HF repo name and old in parameter setting of …

94f38bd

…max seq len to avoid OOM on n150 in #65

tstescoTT mentioned this issue Mar 5, 2025

handle new llama 3.1 8B HF repo name and old in parameter setting max seq len to avoid OOM on n150 in #65 #67

Merged

tstescoTT added a commit that referenced this issue Mar 5, 2025

Merge pull request #67 from tenstorrent/tstesco/fix-llama31-8b-max-se…

e2e0002

…q-len handle new llama 3.1 8B HF repo name and old in parameter setting max seq len to avoid OOM on n150 in #65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 8B OOM on n150 when running vLLM #65

Llama 3.1 8B OOM on n150 when running vLLM #65

tstescoTT commented Mar 4, 2025

tstescoTT commented Mar 4, 2025

Llama 3.1 8B OOM on n150 when running vLLM #65

Llama 3.1 8B OOM on n150 when running vLLM #65

Comments

tstescoTT commented Mar 4, 2025

tstescoTT commented Mar 4, 2025