Llama multi GPU #3804

PaulaScholz · 2023-10-26T21:59:29Z

PaulaScholz
Oct 26, 2023

I have Llama2 running under LlamaSharp (latest drop, 10/26) and CUDA-12. I took a screen capture of the Task Manager running while the model was answering questions and thought I'd provide you the feedback. There are 4 A6000 GPUs on the system with 128GB of system ram. It works, and also loads and runs the 70b models (albeit a bit more slowly). Though it does use all the GPUs, it mostly puts the burden on GPU0.

I wanted to upload a larger video file, but the limit is 10mb.

GPUPerf_Llama2_13bModel.mp4

quarterturn · 2023-10-27T14:48:52Z

quarterturn
Oct 27, 2023

Something changed within the last week. I see this too on my 3x P40 setup, it is trying to utilize GPU0 almost by itself and I eventually get an OOM on the first prompt.

0 replies

dinorocks · 2023-12-17T02:27:05Z

dinorocks
Dec 17, 2023

I have two Tesla P40s, and the primary GPU always gets most of the work. Not sure why. Could someone help? I remember it wasn't always this way, but couldn't pinpoint a moment in time when it changed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama multi GPU #3804

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Llama multi GPU #3804

PaulaScholz Oct 26, 2023

Replies: 2 comments

quarterturn Oct 27, 2023

dinorocks Dec 17, 2023

PaulaScholz
Oct 26, 2023

quarterturn
Oct 27, 2023

dinorocks
Dec 17, 2023