You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if ("meta-llama/Meta-Llama-3.1-8B"inself.model_config.modeland
the old model name is hardcoded and this causes an OOM when running on physical n150 or using MESH_DEVICE=N150
INFO 03-04 11:14:14 tt_executor.py:67] # TT blocks: 2068, # CPU blocks: 0
INFO 03-04 11:14:14 tt_worker.py:67] Allocating kv caches
Allocating TT kv caches for each layer: 56%|██████████████████████████████████████████████████████████████████████████████████████▋ | 18/32 [00:10<00:04, 3.45it/s] Always | FATAL | Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ /tt-metal/tt_metal/impl/allocator/bank_manager.cpp:131: tt::exception
info:
Out of Memory: Not enough space to allocate 143998976 B DRAM buffer across 12 banks, where each bank needs to store 12000640 B
backtrace:
The HF repo for Llama 3.1 8B has been renamed to: meta-llama/Llama-3.1-8B-Instruct
The old name: meta-llama/Meta-Llama-3.1-8B-Instruct redirects to the new repo name.
It looks like in
vllm/vllm/worker/tt_worker.py
Line 240 in 9ac3783
the old model name is hardcoded and this causes an OOM when running on physical n150 or using
MESH_DEVICE=N150
I found that I can get it to work with
but using the alt (newer format) HF model name repros the issue:
Recommendation for quick fix is to drop
meta-llama/Meta-
from the model check so it matches both naming versions as well as all instruct versions.The text was updated successfully, but these errors were encountered: