-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT8.6.1.6 Inference cost too much time #3993
Comments
Parallel processing is only performed when there is still surplus GPU resources, otherwise it is considered serial execution. |
How do i know if the GPU resources is not enough? Can i compute it? |
GPU resources contains lots of things: register, l1, l2, memory bandwidth, shm, cuda core/tensor core etc. Usually need do experiments. It can be roughly viewed through nvidia-smi to see gpu util. |
I check my model and GPU. I think my GPU has enough resource. My GPU is RTXA4000 and model is yolov8s. Even if I use 224,224 as input size. This phenomenon still exists. |
What is your benchmark command or code ? |
I had the same problem. The inference time for batch size of 32 is about 32X larger than that for batch size of 1. But the same model using TensorFlow-TensorRT behaves as expected. The hardware and environment are the same in a Nvidia TensorFlow Container released in 2401. Here is the benchmark command. |
@xxHn-pro How do you metric time ? |
``In TensorRT, it is in the log output. I take "GPU Compute Time" as the inference time.
In TensorFlow-TensorRT, the code is run in python and the inference time is measured as below.
|
I reproduce the problem with an open model from here. Here is the result. The scale of time is about 1.7 with double batch size. Is that normal? I believe that the hardware (A100) is strong enough to handle these batch size in parallel.
Here is the info about the container:
The test was done with
The log32.txt is provided here. log32.txt Any advice or suggestion will be appreciate. |
@lix19937 Can you tell me something to try? Or common on the results please. |
@xxHn-pro dynamic shape model need set min-opt-max shape
|
I have tried these commands.
But the results are the same as before. |
Can you upload the resnet50-v2-7.onnx file ? |
The onnx file can be obtained from https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx |
Make sure that there are no other tasks on the machine during compilation.
#!/bin/bash
data_pairs=(1,2,3,4,5)
for pair in "${data_pairs[@]}"; do
IFS=',' read -r bz <<< "$pair"
trtexec --onnx=./resnet50-v2-7.onnx \
--saveEngine=./resnet50-v2-7_$bz.plan \
--minShapes=data:$bzx3x224x224 \
--optShapes=data:$bzx3x224x224 \
--maxShapes=data:$bzx3x224x224 \
--verbose --dumpProfile --noDataTransfers --useCudaGraph --useSpinWait --separateProfileRun
done
Obviously, multiple batches have increased speed. |
Thanks for the testing. |
@lix19937 [12/10/2024-16:19:51] [V] [TRT] Registering layer: /model/Add for ONNX node: /model/Add |
Please check the output of your command carefully. It is clear and easy to read. |
Thank you. There is indeed an error in the data field. It is normal after changing it to input. |
@kaixiangjin closing this ticket - please re-open if this is still reproducible on TensorRT 10.8, thanks. |
Description
I used tensorRT8.6.1.6 to implement yolov8 inference.I found a problem and it was confused. when i set batchsize from 1 to 12, the inference time was also increased like , batchsize:1 ,time: 10ms; batchsize:2,time: 20ms, ... until batchsize:12, time: 120ms. it seems like the model inference images one by one, not as a whole to inference. is it normal? In my view, if batchsize 2 cost 20ms, then batchsize 4 should also cost 20ms. Cuda should parallel processing. I do not know how to solve this problem. Could someone give me one demo to help me implement this idea.
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: RTX A4000
NVIDIA Driver Version:
CUDA Version: 11.6
CUDNN Version:
Operating System: windows
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: