-
Hi, I'm trying to deploy the 1.58-bit DeepSeek R1 using llama-server on four Titan Vs. Since they only have 48GB VRAM, I set During inference, I noticed that although all four GPUs had their VRAM fully utilized, only the first GPU reached nearly 100% utilization, while the other three remained at around 0~3%. After some time, even the first GPU's utilization dropped to 0%, while CPU utilization spiked to nearly 100%, and the other three GPUs were still around 0% utilization. Is this normal? As I understand, the utilization of these four GPUs should be more evenly distributed. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Only vertical split is supported. Horizontal split is not supported. Vertical splitSimply expanding RAM does not make things faster. Horizontal splitExpanding RAM will make it faster, but it will also increase the amount of data transferred. There are project that support horizontal splitting. |
Beta Was this translation helpful? Give feedback.
Only vertical split is supported. Horizontal split is not supported.
Vertical split
Simply expanding RAM does not make things faster.
pc1->pc2->pc3->...
Horizontal split
Expanding RAM will make it faster, but it will also increase the amount of data transferred.
pc1->pc1->pc1...
pc2->pc2->pc2...
pc3->pc3->pc3...
There are project that support horizontal splitting.
https://github.com/b4rtaz/distributed-llama