Inference LLM Deepseek-v3_671B on CPU only. #11765

jonndoe · 2025-02-08T21:02:22Z

jonndoe
Feb 8, 2025

Could someone help in figuring out the best hardware configuration for LLM inference (CPU only) ?

I have done 3 tests:

AMD Threadripper pro 3955wx(16cores), 8x64GB RAM, DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB) , 16 threads ------->>>>> 2.8 t/s
2xAMD EPYC 7k62(96cores), 16x64GB RAM, DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB) , 96 threads ------>>>>> 2.9 t/s
1xAMD EPYC 7k62(48cores), 8x64GB RAM, DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB) 48 threads ------>>>>> 4.2 t/s

I have tested the same big size model on different configurations and got the above results. So that means that llama.cpp is not optimized at all for dual-cpu-socket motherboards, and I can not use full power of such configurations to speed up LLM inference. It happened that running single instance of llama.cpp on one node (cpu) of dual-cpu setup is far better than on both of them.
A lot of different optimizations did not give any significant inference boost. So based on the above, for the best t/s inference of LLM i.e. DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB) I suggest the following hardware configuration:

Maximum memory channels CPU to be used, i.e. EPYC 9654, which has 12 channels,
More cores CPU to be used, i.e. EPYC 9654, 96 cores,
More frequency CPU to be used, i.e. EPYC 9654, Up to 3.7 GHz
DDR5 ECC memory to be used,
All 12 DDR5 slots to be engaged.

With this setup I am optimistically expecting something around 10 t/s inference speed of the same big model, DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB). Could someone correct if Im wrong, or mb suggest yours ideas and thoughts?

OOXXXXOO · 2025-02-10T10:13:09Z

OOXXXXOO
Feb 10, 2025

Thanks for share , I will test epyc 9755 & 768G DDR5 6800 ECC RAM , feedback later.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference LLM Deepseek-v3_671B on CPU only. #11765

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Inference LLM Deepseek-v3_671B on CPU only. #11765

jonndoe Feb 8, 2025

Replies: 1 comment

OOXXXXOO Feb 10, 2025

jonndoe
Feb 8, 2025

OOXXXXOO
Feb 10, 2025