You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IR as seen in iree-org/iree#19416 generates numerical accuracy errors, since the bias is not quantized with a scale equal to input_scale*weight_scale, which is tacitly assumed to be the case in FuseQuantizedOps. We are re-quantizing the bias with this standard product scale, which does not align with the framework implementation.
My best idea of how to handle this is to factor out the adding of bias if the bias is already quantized with a non-standard scheme. This will default to f32 addition, but we can always add support for mixed-scale quantized add operations.
Here is the IR from the linked issue for reference:
IR as seen in iree-org/iree#19416 generates numerical accuracy errors, since the bias is not quantized with a scale equal to
input_scale*weight_scale
, which is tacitly assumed to be the case inFuseQuantizedOps
. We are re-quantizing the bias with this standard product scale, which does not align with the framework implementation.My best idea of how to handle this is to factor out the adding of bias if the bias is already quantized with a non-standard scheme. This will default to f32 addition, but we can always add support for mixed-scale quantized add operations.
Here is the IR from the linked issue for reference:
The text was updated successfully, but these errors were encountered: