Int8 quantized model performs worse or similar to non quantized fp32 or fp16 model. #4180
Labels
Module:Quantization
Issues related to Quantization
triaged
Issue has been triaged by maintainers
waiting for feedback
Requires more information from user to make progress on the issue.
I am using a pretrained model from timm for convnextv2. It comprises of layer norm and globalresponsenormalization layer but even after adding custom quant modules for layer norm , layer norm 2d and global response norm (grn) I still can't make my model run faster than base model with fp16 engine. I am using python extension for tensorrt and using model OPT to perform the quantization.
My code for creating custom modules are as follows :
I am using the following config :
I am going by the documentation but it is not clear to me if I am doing something wrong. Help is much appreciated.
The text was updated successfully, but these errors were encountered: