Description
Description
I used tensorRT8.6.1.6 to implement yolov8 inference.I found a problem and it was confused. when i set batchsize from 1 to 12, the inference time was also increased like , batchsize:1 ,time: 10ms; batchsize:2,time: 20ms, ... until batchsize:12, time: 120ms. it seems like the model inference images one by one, not as a whole to inference. is it normal? In my view, if batchsize 2 cost 20ms, then batchsize 4 should also cost 20ms. Cuda should parallel processing. I do not know how to solve this problem. Could someone give me one demo to help me implement this idea.
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: RTX A4000
NVIDIA Driver Version:
CUDA Version: 11.6
CUDNN Version:
Operating System: windows
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt
):