Replies: 1 comment
-
Thanks for share , I will test epyc 9755 & 768G DDR5 6800 ECC RAM , feedback later. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Could someone help in figuring out the best hardware configuration for LLM inference (CPU only) ?
I have done 3 tests:
I have tested the same big size model on different configurations and got the above results. So that means that llama.cpp is not optimized at all for dual-cpu-socket motherboards, and I can not use full power of such configurations to speed up LLM inference. It happened that running single instance of llama.cpp on one node (cpu) of dual-cpu setup is far better than on both of them.
A lot of different optimizations did not give any significant inference boost. So based on the above, for the best t/s inference of LLM i.e. DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB) I suggest the following hardware configuration:
With this setup I am optimistically expecting something around 10 t/s inference speed of the same big model, DeepSeek-R1-Q5_K_S.gguf (671B, 461.81GB). Could someone correct if Im wrong, or mb suggest yours ideas and thoughts?
Beta Was this translation helpful? Give feedback.
All reactions