Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 8B OOM on n150 when running vLLM #65

Open
tstescoTT opened this issue Mar 4, 2025 · 1 comment
Open

Llama 3.1 8B OOM on n150 when running vLLM #65

tstescoTT opened this issue Mar 4, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@tstescoTT
Copy link

The HF repo for Llama 3.1 8B has been renamed to: meta-llama/Llama-3.1-8B-Instruct

The old name: meta-llama/Meta-Llama-3.1-8B-Instruct redirects to the new repo name.

It looks like in

if ("meta-llama/Meta-Llama-3.1-8B" in self.model_config.model and

the old model name is hardcoded and this causes an OOM when running on physical n150 or using MESH_DEVICE=N150

INFO 03-04 11:14:14 tt_executor.py:67] # TT blocks: 2068, # CPU blocks: 0
INFO 03-04 11:14:14 tt_worker.py:67] Allocating kv caches
Allocating TT kv caches for each layer:  56%|██████████████████████████████████████████████████████████████████████████████████████▋                                                                   | 18/32 [00:10<00:04,  3.45it/s]                 Always | FATAL    | Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ /tt-metal/tt_metal/impl/allocator/bank_manager.cpp:131: tt::exception
info:
Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
backtrace:

I found that I can get it to work with

python examples/server_example_tt.py --model meta-llama/Meta-Llama-3.1-8B-Instruct

but using the alt (newer format) HF model name repros the issue:

python examples/server_example_tt.py --model meta-llama/Llama-3.1-8B-Instruct

Recommendation for quick fix is to drop meta-llama/Meta- from the model check so it matches both naming versions as well as all instruct versions.

@tstescoTT tstescoTT added the bug Something isn't working label Mar 4, 2025
@tstescoTT
Copy link
Author

Linked issue: tenstorrent/tt-inference-server#104

@tstescoTT tstescoTT self-assigned this Mar 5, 2025
tstescoTT added a commit that referenced this issue Mar 5, 2025
tstescoTT added a commit that referenced this issue Mar 5, 2025
…q-len

handle new llama 3.1 8B HF repo name and old in parameter setting max seq len to avoid OOM on n150 in #65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant