Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added quantization for OUTETTS #2662

Draft
wants to merge 24 commits into
base: latest
Choose a base branch
from

Conversation

nikita-malininn
Copy link

@nikita-malininn nikita-malininn commented Jan 16, 2025

Ticket: 157133

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nikita-malininn nikita-malininn marked this pull request as ready for review January 16, 2025 13:04
@nikita-malininn nikita-malininn marked this pull request as draft January 16, 2025 13:12
@nikita-malininn nikita-malininn marked this pull request as ready for review January 27, 2025 11:38
@MaximProshin
Copy link
Contributor

@KodiaqQ , what results do you get with the quantized model vs original on your machine?

@nikita-malininn
Copy link
Author

nikita-malininn commented Jan 28, 2025

@KodiaqQ , what results do you get with the quantized model vs original on your machine?

FP model generate time: 3.926095366012305
INT model generate time: 2.8104791679652408

upd. Recalculated times with ignored scope.

@nikita-malininn nikita-malininn marked this pull request as draft January 28, 2025 10:30
@nikita-malininn nikita-malininn marked this pull request as ready for review February 3, 2025 14:12
@nikita-malininn nikita-malininn marked this pull request as draft February 6, 2025 13:10
"hf_model = OVHFModel(model_dir, device.value).model\n",
"dataset = nncf.Dataset(libritts, partial(transform_fn, interface=interface))\n",
"\n",
"quantized_model = nncf.quantize(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use INT4 weight compression with dynamic quantization (A8W4). @KodiaqQ claim that the performance of such model is equal to the performance of the quantized model, but compression reate is higher for A8W4 model.

cc' @MaximProshin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share numbers for both cases. If int4 is better, I'm ok to use that method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants