-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] quantized_resnet_test.py failed on no attribute 'EXPLICIT_PRECISION' #3362
Comments
Does this example not work for you? https://pytorch.org/TensorRT/tutorials/_rendered_examples/dynamo/vgg16_ptq.html. |
Thank you for pointing out the vgg16_ptq example! However, this example uses modelopt for post-training quantization, while our workflow specifically relies on torch.fx for quantization and lowering to TensorRT. The quantized_resnet_test.py script appears to be one of the few examples in the repository demonstrating this approach. Unfortunately, the script does not work as expected due to the issue with the deprecated EXPLICIT_PRECISION flag in TensorRT. This raises a couple of questions: If this workflow is no longer supported, are there plans to update it, or could you provide an alternative example demonstrating fx-based quantization and integration with TensorRT? This would be immensely helpful for users exploring this specific workflow. Thank you for your ssupport! |
In theory you can continue to use this workflow through the dynamo frontend. As long as the operations that the fx converters use aren't getting lowered out, they will still work today. If they are getting lowered out we can patch them for dynamo.
The fx frontend is no longer being actively maintained as it has been been superseded by dynamo.
We can explore making an example for dynamo frontend that replicates the behavior but it wont use the FX frontend. |
Got it. So if i understand you correctly, we can still quantize using torch.fx and then lower to TensorRT with the Dynamo frontend? That’s good to know because right now, it feels like there are two main options for quantization: NVIDIA’s Model Optimization Toolkit, which is still pretty early in development, and torch.fx, which a lot of people are already using. Would it be possible to share an example showing how to quantize a model using torch.fx and then lower it using the Dynamo frontend in Torch-TensorRT? It would really help clarify how to transition workflows like this without relying on outdated tools. |
I’m currently stuck with this workflow. We’re quantizing models using torch.fx, but I’m running into issues with all of Torch-TensorRT’s frontends: TorchScript (TS) and Dynamo: These don’t seem to support torch.fx quantized graphs. Is there any supported way to lower torch.fx quantized graphs to TensorRT? |
To solve your issue right now, you can either use ModelOpt, or quickly patch fx's TRTInterpreter to not need the explicit precision flag (its explicit precision by default so you dont need to replace it with anything). We are investigating supporting PT2 quantization in Dynamo but the opset is different so we cannot directly use the same converters. It is also unclear from PyTorch's side if PT2 quantization is their future direction or if torchao is so we are trying to clarify this with them before committing to supporting it. |
should int8 quantization work in torch_tensorrt 2.4.0 (we are restricted to python version)?
|
The example script fx/quantized_resnet_test.py in the Torch-TensorRT repository fails to execute due to the use of a deprecated attribute EXPLICIT_PRECISION in the TensorRT Python API. This attribute is no longer available in recent versions of TensorRT (e.g., TensorRT 10.1).
The error traceback is as follows:
To Reproduce
Steps to reproduce the behavior:
Steps to reproduce the behavior:
Expected behavior
The script should run successfully, converting the quantized ResNet model to TensorRT without encountering an error.
Environment
Torch-TensorRT Version: 2.4.0
PyTorch Version: 2.4.0
CPU Architecture: amd64
OS: Ubuntu 22.04
How you installed PyTorch: pip
Build command you used (if compiling from source): N/A
Are you using local sources or building from archives: Building from local sources
Python version: 3.10
CUDA version: 11.8
GPU models and configuration: NVIDIA A40
Any other relevant information: Running TensorRT 10.1.0
Additional context
The issue seems to stem from the use of the deprecated EXPLICIT_PRECISION flag in the TRTInterpreter class within torch_tensorrt/fx/fx2trt.py. TensorRT 10.1 does not support this attribute, and its usage needs to be updated to align with the latest TensorRT API.
This script is one of the very few examples that demonstrates how to quantize a model using FX and lower it to TensorRT. It is a valuable resource for users looking to implement this workflow.
If addressing this issue immediately is not feasible, it would be extremely helpful if an alternative example could be provided to demonstrate how to achieve model quantization and conversion to TensorRT using FX. This would ensure users can still proceed with their workflows while awaiting a permanent fix.
THANKS!
The text was updated successfully, but these errors were encountered: