Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

gaoyu-cao · 2025-01-21T03:26:52Z

Description

I tried to perform inference time statistics for the segmentation model on my machine(bisenetv2) between TRT-8.5.3.1 VS TRT-10.5.0.18. But I found a big difference in inference speed between the two versions. While 5.8ms with TTR8 and 9.0ms with TRT10 for the same model using "trtexec --loadEngine". It doesn't look right, I need your help. Thanks!!

Environment

TensorRT Version: TRT8.5.3.1/TRT-10.5.0.18

NVIDIA GPU: RTX 3060 - 12G

NVIDIA Driver Version: 536.23

CUDA Version: V11.6

CUDNN Version: V6.5.0

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://github.com/CoinCheung/BiSeNet/releases/tag/0.0.0

Steps To Reproduce

Commands or scripts:

For TRT8: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet8.trt --fp16

using trtexec.exe --loadEngine=./besnet8.trt get result:
"GPU Compute Time: min = 5.3894 ms, max = 7.0011 ms, mean = 5.82974 ms, median = 5.73438 ms, percentile(90%) = 6.32324 ms, percentile(95%) = 6.56079 ms, percentile(99%) = 7.0011 ms"

For TRT10: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet10.trt --fp16

using trtexec.exe --loadEngine=./besnet10.trt get result:
"GPU Compute Time: min = 8.03729 ms, max = 10.4243 ms, mean = 9.02904 ms, median = 8.96878 ms, percentile(90%) = 9.50806 ms, percentile(95%) = 9.74951 ms, percentile(99%) = 10.4243 ms"

Have you tried the latest release?:
Not yet.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

gaoyu-cao · 2025-01-21T03:29:05Z

by the way, the infer results are consistent..

lix19937 · 2025-01-21T08:57:32Z

For trt10.5.0.18, you should add the flag --builderOptimizationLevel=5.

gaoyu-cao · 2025-01-21T12:44:03Z

For trt10.5.0.18, you should add the flag --builderOptimizationLevel=5.

I tried your suggestion but didn't get any performance improvement

lix19937 · 2025-01-22T00:51:10Z

Try to use the latest version of trt.

samurdhikaru added the Performance General performance issues label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

gaoyu-cao commented Jan 21, 2025

gaoyu-cao commented Jan 21, 2025

lix19937 commented Jan 21, 2025

gaoyu-cao commented Jan 21, 2025

lix19937 commented Jan 22, 2025

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G #4332

Comments

gaoyu-cao commented Jan 21, 2025

Description

Environment

Relevant Files

Steps To Reproduce

gaoyu-cao commented Jan 21, 2025

lix19937 commented Jan 21, 2025

gaoyu-cao commented Jan 21, 2025

lix19937 commented Jan 22, 2025