Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

Open
toothache opened this issue Jan 24, 2025 · 2 comments
Open

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

toothache opened this issue Jan 24, 2025 · 2 comments
Assignees
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. ONNX Issues relating to ONNX usage and import triaged Issue has been triaged by maintainers

Comments

@toothache
Copy link

toothache commented Jan 24, 2025

Description

Onnx GroupNormalization is introduced in Opset 18 and has an update in Opset 21. It changes the input shape of scale and bias from (G) to (C) to comply with the original paper and torch implementation.

https://onnx.ai/onnx/operators/text_diff_GroupNormalization_18_21.html

TensorRT can run GroupNormalization Op18, but it fails to run the latest Op version.

Environment

TensorRT Version: 10.7.0

NVIDIA GPU: NVIDIA A100 80GB PCIe

NVIDIA Driver Version: 565.57.01

CUDA Version: 12.7

CUDNN Version: 9.6.0

Operating System: Linux

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.12-py3

Relevant Files

Model link:

Opset21
Opset18

Steps To Reproduce

The error is raised from INormalizationLayer:

IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)

Full logs:

root@cca92a5458f4:/workspace# trtexec --onnx=model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100700] [b23] # trtexec --onnx=model.onnx
[01/24/2025-05:21:16] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[01/24/2025-05:21:16] [I] === Model Options ===
[01/24/2025-05:21:16] [I] Format: ONNX
[01/24/2025-05:21:16] [I] Model: model.onnx
[01/24/2025-05:21:16] [I] Output:
[01/24/2025-05:21:16] [I] === Build Options ===
[01/24/2025-05:21:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/24/2025-05:21:16] [I] avgTiming: 8
[01/24/2025-05:21:16] [I] Precision: FP32
[01/24/2025-05:21:16] [I] LayerPrecisions:
[01/24/2025-05:21:16] [I] Layer Device Types:
[01/24/2025-05:21:16] [I] Calibration:
[01/24/2025-05:21:16] [I] Refit: Disabled
[01/24/2025-05:21:16] [I] Strip weights: Disabled
[01/24/2025-05:21:16] [I] Version Compatible: Disabled
[01/24/2025-05:21:16] [I] ONNX Plugin InstanceNorm: Disabled
[01/24/2025-05:21:16] [I] TensorRT runtime: full
[01/24/2025-05:21:16] [I] Lean DLL Path:
[01/24/2025-05:21:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/24/2025-05:21:16] [I] Exclude Lean Runtime: Disabled
[01/24/2025-05:21:16] [I] Sparsity: Disabled
[01/24/2025-05:21:16] [I] Safe mode: Disabled
[01/24/2025-05:21:16] [I] Build DLA standalone loadable: Disabled
[01/24/2025-05:21:16] [I] Allow GPU fallback for DLA: Disabled
[01/24/2025-05:21:16] [I] DirectIO mode: Disabled
[01/24/2025-05:21:16] [I] Restricted mode: Disabled
[01/24/2025-05:21:16] [I] Skip inference: Disabled
[01/24/2025-05:21:16] [I] Save engine:
[01/24/2025-05:21:16] [I] Load engine:
[01/24/2025-05:21:16] [I] Profiling verbosity: 0
[01/24/2025-05:21:16] [I] Tactic sources: Using default tactic sources
[01/24/2025-05:21:16] [I] timingCacheMode: local
[01/24/2025-05:21:16] [I] timingCacheFile:
[01/24/2025-05:21:16] [I] Enable Compilation Cache: Enabled
[01/24/2025-05:21:16] [I] Enable Monitor Memory: Disabled
[01/24/2025-05:21:16] [I] errorOnTimingCacheMiss: Disabled
[01/24/2025-05:21:16] [I] Preview Features: Use default preview flags.
[01/24/2025-05:21:16] [I] MaxAuxStreams: -1
[01/24/2025-05:21:16] [I] BuilderOptimizationLevel: -1
[01/24/2025-05:21:16] [I] MaxTactics: -1
[01/24/2025-05:21:16] [I] Calibration Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming: Disabled
[01/24/2025-05:21:16] [I] Runtime Platform: Same As Build
[01/24/2025-05:21:16] [I] Debug Tensors:
[01/24/2025-05:21:16] [I] Input(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Output(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Input build shapes: model
[01/24/2025-05:21:16] [I] Input calibration shapes: model
[01/24/2025-05:21:16] [I] === System Options ===
[01/24/2025-05:21:16] [I] Device: 0
[01/24/2025-05:21:16] [I] DLACore:
[01/24/2025-05:21:16] [I] Plugins:
[01/24/2025-05:21:16] [I] setPluginsToSerialize:
[01/24/2025-05:21:16] [I] dynamicPlugins:
[01/24/2025-05:21:16] [I] ignoreParsedPluginLibs: 0
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Inference Options ===
[01/24/2025-05:21:16] [I] Batch: Explicit
[01/24/2025-05:21:16] [I] Input inference shapes: model
[01/24/2025-05:21:16] [I] Iterations: 10
[01/24/2025-05:21:16] [I] Duration: 3s (+ 200ms warm up)
[01/24/2025-05:21:16] [I] Sleep time: 0ms
[01/24/2025-05:21:16] [I] Idle time: 0ms
[01/24/2025-05:21:16] [I] Inference Streams: 1
[01/24/2025-05:21:16] [I] ExposeDMA: Disabled
[01/24/2025-05:21:16] [I] Data transfers: Enabled
[01/24/2025-05:21:16] [I] Spin-wait: Disabled
[01/24/2025-05:21:16] [I] Multithreading: Disabled
[01/24/2025-05:21:16] [I] CUDA Graph: Disabled
[01/24/2025-05:21:16] [I] Separate profiling: Disabled
[01/24/2025-05:21:16] [I] Time Deserialize: Disabled
[01/24/2025-05:21:16] [I] Time Refit: Disabled
[01/24/2025-05:21:16] [I] NVTX verbosity: 0
[01/24/2025-05:21:16] [I] Persistent Cache Ratio: 0
[01/24/2025-05:21:16] [I] Optimization Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming Budget: 100.000000%
[01/24/2025-05:21:16] [I] Inputs:
[01/24/2025-05:21:16] [I] Debug Tensor Save Destinations:
[01/24/2025-05:21:16] [I] === Reporting Options ===
[01/24/2025-05:21:16] [I] Verbose: Disabled
[01/24/2025-05:21:16] [I] Averages: 10 inferences
[01/24/2025-05:21:16] [I] Percentiles: 90,95,99
[01/24/2025-05:21:16] [I] Dump refittable layers:Disabled
[01/24/2025-05:21:16] [I] Dump output: Disabled
[01/24/2025-05:21:16] [I] Profile: Disabled
[01/24/2025-05:21:16] [I] Export timing to JSON file:
[01/24/2025-05:21:16] [I] Export output to JSON file:
[01/24/2025-05:21:16] [I] Export profile to JSON file:
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Device Information ===
[01/24/2025-05:21:16] [I] Available Devices:
[01/24/2025-05:21:16] [I]   Device 0: "NVIDIA A100 80GB PCIe" UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Selected Device: NVIDIA A100 80GB PCIe
[01/24/2025-05:21:17] [I] Selected Device ID: 0
[01/24/2025-05:21:17] [I] Selected Device UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Compute Capability: 8.0
[01/24/2025-05:21:17] [I] SMs: 108
[01/24/2025-05:21:17] [I] Device Global Memory: 81155 MiB
[01/24/2025-05:21:17] [I] Shared Memory per SM: 164 KiB
[01/24/2025-05:21:17] [I] Memory Bus Width: 5120 bits (ECC enabled)
[01/24/2025-05:21:17] [I] Application Compute Clock Rate: 1.41 GHz
[01/24/2025-05:21:17] [I] Application Memory Clock Rate: 1.512 GHz
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] TensorRT version: 10.7.0
[01/24/2025-05:21:17] [I] Loading standard plugins
[01/24/2025-05:21:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 23, GPU 426 (MiB)
[01/24/2025-05:21:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2038, GPU +374, now: CPU 2218, GPU 800 (MiB)
[01/24/2025-05:21:19] [I] Start parsing network model.
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] [TRT] Input filename:   model.onnx
[01/24/2025-05:21:19] [I] [TRT] ONNX IR version:  0.0.10
[01/24/2025-05:21:19] [I] [TRT] Opset version:    21
[01/24/2025-05:21:19] [I] [TRT] Producer name:    backend-test
[01/24/2025-05:21:19] [I] [TRT] Producer version:
[01/24/2025-05:21:19] [I] [TRT] Domain:
[01/24/2025-05:21:19] [I] [TRT] Model version:    0
[01/24/2025-05:21:19] [I] [TRT] Doc string:
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] Finished parsing network model. Parse time: 0.00107378
[01/24/2025-05:21:19] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)
[01/24/2025-05:21:19] [E] Engine could not be created from network
[01/24/2025-05:21:19] [E] Building engine failed
[01/24/2025-05:21:19] [E] Failed to create engine from model or file.
[01/24/2025-05:21:19] [E] Engine set up failed

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes

@yuanyao-nv yuanyao-nv added the ONNX Issues relating to ONNX usage and import label Jan 24, 2025
@kevinch-nv kevinch-nv added the triaged Issue has been triaged by maintainers label Jan 30, 2025
@kevinch-nv kevinch-nv self-assigned this Jan 30, 2025
@kevinch-nv kevinch-nv added the internal-bug-tracked Tracked internally, will be fixed in a future release. label Jan 31, 2025
@kevinch-nv
Copy link
Collaborator

Thanks for the report. TRT only supports the opset 17 definition at the moment.

@toothache
Copy link
Author

Thanks for the report. TRT only supports the opset 17 definition at the moment.

Is there any documentation of the supported Opset version? The following link suggests that Op20 is supported.

https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md

TensorRT 10.0 supports operators in the inclusive range of opset 9 to opset 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-bug-tracked Tracked internally, will be fixed in a future release. ONNX Issues relating to ONNX usage and import triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants