-
-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
I’ve tested multiple models (e.g., Devstral, Qwen Coder 30B, Qwen3 4B, Qwen3 14B) using ExLLaMAv3 on my hardware, and I consistently encounter the same issue:
- GPU underutilization: The GPU usage peaks at only 80%, even during inference.
- Power consumption: The GPU draws 80–100W (out of a 160W power limit).
- CPU bottleneck: During inference, one CPU core consistently reaches 100% utilization, suggesting a potential bottleneck on the CPU side.
My hardware configuration:
- CPU: AMD Ryzen 9 5900X
- GPU: NVIDIA RTX 5060 Ti 16GB
- RAM: 32GB DDR4
- OS: Arch Linux with CUDA 13 installed
Typical use: Cline
I suspect the CPU might be the limiting factor here. Could there be a specific setting, driver issue, or resource allocation problem on my end causing this behavior?
Translated with Qwen3 14b
Metadata
Metadata
Assignees
Labels
No labels