Llama multi GPU #3804
PaulaScholz
started this conversation in
Show and tell
Llama multi GPU
#3804
Replies: 2 comments
-
Something changed within the last week. I see this too on my 3x P40 setup, it is trying to utilize GPU0 almost by itself and I eventually get an OOM on the first prompt. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have Llama2 running under LlamaSharp (latest drop, 10/26) and CUDA-12. I took a screen capture of the Task Manager running while the model was answering questions and thought I'd provide you the feedback. There are 4 A6000 GPUs on the system with 128GB of system ram. It works, and also loads and runs the 70b models (albeit a bit more slowly). Though it does use all the GPUs, it mostly puts the burden on GPU0.
I wanted to upload a larger video file, but the limit is 10mb.
GPUPerf_Llama2_13bModel.mp4
Beta Was this translation helpful? Give feedback.
All reactions