I Benchmarked 25 models at 16GB, 6.5GB, and 3.5GB sizes to find out whether a large model with smaller quant is better than a small model with bigger quant #11468
ZoontS
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My Takeaway
Conclusion
You should use higher parameter count models IF you could fit anything better than the IQ3_XS quants. Q2 and Q1 quants are not worth it.
Personally, I would target IQ4_XS for GPU inference, and Q4_0 for CPU-only inference for the extra speed.
Beta Was this translation helpful? Give feedback.
All reactions