Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

Open
gaoyu-cao opened this issue Jan 21, 2025 · 4 comments
Labels
Performance General performance issues

Comments

@gaoyu-cao
Copy link

Description

I tried to perform inference time statistics for the segmentation model on my machine(bisenetv2) between TRT-8.5.3.1 VS TRT-10.5.0.18. But I found a big difference in inference speed between the two versions. While 5.8ms with TTR8 and 9.0ms with TRT10 for the same model using "trtexec --loadEngine". It doesn't look right, I need your help. Thanks!!

Environment

TensorRT Version: TRT8.5.3.1/TRT-10.5.0.18

NVIDIA GPU: RTX 3060 - 12G

NVIDIA Driver Version: 536.23

CUDA Version: V11.6

CUDNN Version: V6.5.0

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://github.com/CoinCheung/BiSeNet/releases/tag/0.0.0

Steps To Reproduce

Commands or scripts:

  1. For TRT8: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet8.trt --fp16

using trtexec.exe --loadEngine=./besnet8.trt get result:
"GPU Compute Time: min = 5.3894 ms, max = 7.0011 ms, mean = 5.82974 ms, median = 5.73438 ms, percentile(90%) = 6.32324 ms, percentile(95%) = 6.56079 ms, percentile(99%) = 7.0011 ms"

  1. For TRT10: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet10.trt --fp16

using trtexec.exe --loadEngine=./besnet10.trt get result:
"GPU Compute Time: min = 8.03729 ms, max = 10.4243 ms, mean = 9.02904 ms, median = 8.96878 ms, percentile(90%) = 9.50806 ms, percentile(95%) = 9.74951 ms, percentile(99%) = 10.4243 ms"

Have you tried the latest release?:
Not yet.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@gaoyu-cao
Copy link
Author

by the way, the infer results are consistent..

@lix19937
Copy link

For trt10.5.0.18, you should add the flag --builderOptimizationLevel=5.

@gaoyu-cao
Copy link
Author

For trt10.5.0.18, you should add the flag --builderOptimizationLevel=5.

I tried your suggestion but didn't get any performance improvement

@lix19937
Copy link

Try to use the latest version of trt.

@samurdhikaru samurdhikaru added the Performance General performance issues label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance General performance issues
Projects
None yet
Development

No branches or pull requests

3 participants