Multi-GPU Utilization Issue with llama-server #11766

Apscg · 2025-02-08T22:27:40Z

Apscg
Feb 8, 2025

Hi, I'm trying to deploy the 1.58-bit DeepSeek R1 using llama-server on four Titan Vs. Since they only have 48GB VRAM, I set ngl=15 (considering a total of 61 layers).

During inference, I noticed that although all four GPUs had their VRAM fully utilized, only the first GPU reached nearly 100% utilization, while the other three remained at around 0~3%. After some time, even the first GPU's utilization dropped to 0%, while CPU utilization spiked to nearly 100%, and the other three GPUs were still around 0% utilization.

Is this normal? As I understand, the utilization of these four GPUs should be more evenly distributed.

Answered by myan-o

Feb 8, 2025

Only vertical split is supported. Horizontal split is not supported.

Vertical split

Simply expanding RAM does not make things faster.
pc1->pc2->pc3->...

Horizontal split

Expanding RAM will make it faster, but it will also increase the amount of data transferred.
pc1->pc1->pc1...
pc2->pc2->pc2...
pc3->pc3->pc3...

There are project that support horizontal splitting.
https://github.com/b4rtaz/distributed-llama

View full answer

myan-o · 2025-02-08T23:08:53Z

myan-o
Feb 8, 2025

Only vertical split is supported. Horizontal split is not supported.

Vertical split

Simply expanding RAM does not make things faster.
pc1->pc2->pc3->...

Horizontal split

Expanding RAM will make it faster, but it will also increase the amount of data transferred.
pc1->pc1->pc1...
pc2->pc2->pc2...
pc3->pc3->pc3...

There are project that support horizontal splitting.
https://github.com/b4rtaz/distributed-llama

1 reply

Apscg Feb 9, 2025
Author

Thanks for your answer. Is this the same as --split-mode row?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Utilization Issue with llama-server #11766

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Multi-GPU Utilization Issue with llama-server #11766

Apscg Feb 8, 2025

Vertical split

Horizontal split

Replies: 1 comment · 1 reply

myan-o Feb 8, 2025

Vertical split

Horizontal split

Apscg Feb 9, 2025 Author

Apscg
Feb 8, 2025

Replies: 1 comment 1 reply

myan-o
Feb 8, 2025

Apscg Feb 9, 2025
Author