Skip to content

Commit

Permalink
Update files
Browse files Browse the repository at this point in the history
  • Loading branch information
leikareipa committed Dec 30, 2024
1 parent bbcba3d commit fd3cb5c
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion blog/comparing-quants-of-qwq-preview-in-ollama/content.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,6 @@ For reference, the table includes the unquantized version of QwQ as hosted on Hu

These tests roughly confirm the previously-found bell curve, where Q4_K_M is the apex quant and categorically better than Q8_0. More broadly, the sweet spot appears to be between Q3_K_M and Q5_K_M, inclusive.

Q8_0 was in fact so bad that it indicates a potential problem with Ollama. I don't see many other explanations for it being categorically worse than Q4_K_M <i>and</i> the unquantized version.
Q8_0 was in fact so bad that it indicates a potential problem with Ollama. Outside of these tests, I've also seen preliminary indications that Ollama's FP16 struggles in a similar way. I opened an issue for this, but it was closed without resolution, so it's not something the Ollama authors are concerned about.

All of that said, my time with QwQ has also shown a fair bit of variance in output quality within any given quant. You'd ideally do very many runs of a test to find a realistic average.
2 changes: 1 addition & 1 deletion blog/comparing-quants-of-qwq-preview-in-ollama/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@
<p>For reference, the table includes the unquantized version of QwQ as hosted on Hugging Face Playground.</p>
</dokki-topic><dokki-topic title="Conclusions">
<p>These tests roughly confirm the previously-found bell curve, where Q4_K_M is the apex quant and categorically better than Q8_0. More broadly, the sweet spot appears to be between Q3_K_M and Q5_K_M, inclusive.</p>
<p>Q8_0 was in fact so bad that it indicates a potential problem with Ollama. I don't see many other explanations for it being categorically worse than Q4_K_M <i>and</i> the unquantized version.</p>
<p>Q8_0 was in fact so bad that it indicates a potential problem with Ollama. Outside of these tests, I've also seen preliminary indications that Ollama's FP16 struggles in a similar way. I opened an issue for this, but it was closed without resolution, so it's not something the Ollama authors are concerned about.</p>
<p>All of that said, my time with QwQ has also shown a fair bit of variance in output quality within any given quant. You'd ideally do very many runs of a test to find a realistic average.</p>
</dokki-topic>

Expand Down

0 comments on commit fd3cb5c

Please sign in to comment.