TensorRT8.6.1.6 Inference cost too much time

## Description


I used tensorRT8.6.1.6 to implement yolov8 inference.I found a problem and it was confused. when i set batchsize from 1 to 12, the inference time was also increased like , batchsize:1 ,time: 10ms; batchsize:2,time: 20ms, ... until  batchsize:12, time: 120ms. it seems like the model inference images one by one, not as a whole to inference. is it normal?  In my view, if batchsize 2 cost 20ms, then batchsize 4 should also cost 20ms. Cuda should parallel processing. I do not know how to solve this problem. Could someone give me one demo to help me implement this idea.

## Environment



**TensorRT Version**: 8.6.1.6

**NVIDIA GPU**: RTX A4000

**NVIDIA Driver Version**:

**CUDA Version**: 11.6

**CUDNN Version**:


Operating System: windows

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):


## Relevant Files



**Model link**:


## Steps To Reproduce



**Commands or scripts**:

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**:

**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorRT8.6.1.6 Inference cost too much time #3993

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorRT8.6.1.6 Inference cost too much time #3993

Description

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions