Enhance the performance of quantized model in terms of accuracy #10951

hieuhthh · 2024-12-23T04:59:43Z

hieuhthh
Dec 23, 2024

After applying quantization with q4_0, I noticed that the performance of my generated results has declined. I appreciate the speed and reduced VRAM usage that this quantized model offers, but I am seeking ways to enhance its performance. Could you please suggest any solutions or improvements, such as calibration, LoRA, fine-tuning, or other techniques? Thank you for your assistance!

FNsi · 2025-01-06T13:24:38Z

FNsi
Jan 6, 2025

Ah, me, searched 1.58bit in huggingface... there's a method related it.

But ...I compared a lot, and selected a smaller model.

0 replies

misutoneko · 2025-01-07T07:35:46Z

misutoneko
Jan 7, 2025

Is there a reason to use q4_0 specifically? I mean, it's ok but it's old.
You'd most likely get improved results just by switching to iq4_xs.
This is model-specific though, AFAICR some of the new llama versions are more sensitive to quantizing than the old models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance the performance of quantized model in terms of accuracy #10951

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Enhance the performance of quantized model in terms of accuracy #10951

Uh oh!

hieuhthh Dec 23, 2024

Replies: 2 comments

Uh oh!

FNsi Jan 6, 2025

Uh oh!

misutoneko Jan 7, 2025

hieuhthh
Dec 23, 2024

FNsi
Jan 6, 2025

misutoneko
Jan 7, 2025