Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

toothache · 2025-01-24T05:32:06Z

Description

Onnx GroupNormalization is introduced in Opset 18 and has an update in Opset 21. It changes the input shape of scale and bias from (G) to (C) to comply with the original paper and torch implementation.

https://onnx.ai/onnx/operators/text_diff_GroupNormalization_18_21.html

TensorRT can run GroupNormalization Op18, but it fails to run the latest Op version.

Environment

TensorRT Version: 10.7.0

NVIDIA GPU: NVIDIA A100 80GB PCIe

NVIDIA Driver Version: 565.57.01

CUDA Version: 12.7

CUDNN Version: 9.6.0

Operating System: Linux

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.12-py3

Relevant Files

Model link:

Opset21
Opset18

Steps To Reproduce

The error is raised from INormalizationLayer:

IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)

Full logs:

root@cca92a5458f4:/workspace# trtexec --onnx=model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100700] [b23] # trtexec --onnx=model.onnx
[01/24/2025-05:21:16] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[01/24/2025-05:21:16] [I] === Model Options ===
[01/24/2025-05:21:16] [I] Format: ONNX
[01/24/2025-05:21:16] [I] Model: model.onnx
[01/24/2025-05:21:16] [I] Output:
[01/24/2025-05:21:16] [I] === Build Options ===
[01/24/2025-05:21:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/24/2025-05:21:16] [I] avgTiming: 8
[01/24/2025-05:21:16] [I] Precision: FP32
[01/24/2025-05:21:16] [I] LayerPrecisions:
[01/24/2025-05:21:16] [I] Layer Device Types:
[01/24/2025-05:21:16] [I] Calibration:
[01/24/2025-05:21:16] [I] Refit: Disabled
[01/24/2025-05:21:16] [I] Strip weights: Disabled
[01/24/2025-05:21:16] [I] Version Compatible: Disabled
[01/24/2025-05:21:16] [I] ONNX Plugin InstanceNorm: Disabled
[01/24/2025-05:21:16] [I] TensorRT runtime: full
[01/24/2025-05:21:16] [I] Lean DLL Path:
[01/24/2025-05:21:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/24/2025-05:21:16] [I] Exclude Lean Runtime: Disabled
[01/24/2025-05:21:16] [I] Sparsity: Disabled
[01/24/2025-05:21:16] [I] Safe mode: Disabled
[01/24/2025-05:21:16] [I] Build DLA standalone loadable: Disabled
[01/24/2025-05:21:16] [I] Allow GPU fallback for DLA: Disabled
[01/24/2025-05:21:16] [I] DirectIO mode: Disabled
[01/24/2025-05:21:16] [I] Restricted mode: Disabled
[01/24/2025-05:21:16] [I] Skip inference: Disabled
[01/24/2025-05:21:16] [I] Save engine:
[01/24/2025-05:21:16] [I] Load engine:
[01/24/2025-05:21:16] [I] Profiling verbosity: 0
[01/24/2025-05:21:16] [I] Tactic sources: Using default tactic sources
[01/24/2025-05:21:16] [I] timingCacheMode: local
[01/24/2025-05:21:16] [I] timingCacheFile:
[01/24/2025-05:21:16] [I] Enable Compilation Cache: Enabled
[01/24/2025-05:21:16] [I] Enable Monitor Memory: Disabled
[01/24/2025-05:21:16] [I] errorOnTimingCacheMiss: Disabled
[01/24/2025-05:21:16] [I] Preview Features: Use default preview flags.
[01/24/2025-05:21:16] [I] MaxAuxStreams: -1
[01/24/2025-05:21:16] [I] BuilderOptimizationLevel: -1
[01/24/2025-05:21:16] [I] MaxTactics: -1
[01/24/2025-05:21:16] [I] Calibration Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming: Disabled
[01/24/2025-05:21:16] [I] Runtime Platform: Same As Build
[01/24/2025-05:21:16] [I] Debug Tensors:
[01/24/2025-05:21:16] [I] Input(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Output(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Input build shapes: model
[01/24/2025-05:21:16] [I] Input calibration shapes: model
[01/24/2025-05:21:16] [I] === System Options ===
[01/24/2025-05:21:16] [I] Device: 0
[01/24/2025-05:21:16] [I] DLACore:
[01/24/2025-05:21:16] [I] Plugins:
[01/24/2025-05:21:16] [I] setPluginsToSerialize:
[01/24/2025-05:21:16] [I] dynamicPlugins:
[01/24/2025-05:21:16] [I] ignoreParsedPluginLibs: 0
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Inference Options ===
[01/24/2025-05:21:16] [I] Batch: Explicit
[01/24/2025-05:21:16] [I] Input inference shapes: model
[01/24/2025-05:21:16] [I] Iterations: 10
[01/24/2025-05:21:16] [I] Duration: 3s (+ 200ms warm up)
[01/24/2025-05:21:16] [I] Sleep time: 0ms
[01/24/2025-05:21:16] [I] Idle time: 0ms
[01/24/2025-05:21:16] [I] Inference Streams: 1
[01/24/2025-05:21:16] [I] ExposeDMA: Disabled
[01/24/2025-05:21:16] [I] Data transfers: Enabled
[01/24/2025-05:21:16] [I] Spin-wait: Disabled
[01/24/2025-05:21:16] [I] Multithreading: Disabled
[01/24/2025-05:21:16] [I] CUDA Graph: Disabled
[01/24/2025-05:21:16] [I] Separate profiling: Disabled
[01/24/2025-05:21:16] [I] Time Deserialize: Disabled
[01/24/2025-05:21:16] [I] Time Refit: Disabled
[01/24/2025-05:21:16] [I] NVTX verbosity: 0
[01/24/2025-05:21:16] [I] Persistent Cache Ratio: 0
[01/24/2025-05:21:16] [I] Optimization Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming Budget: 100.000000%
[01/24/2025-05:21:16] [I] Inputs:
[01/24/2025-05:21:16] [I] Debug Tensor Save Destinations:
[01/24/2025-05:21:16] [I] === Reporting Options ===
[01/24/2025-05:21:16] [I] Verbose: Disabled
[01/24/2025-05:21:16] [I] Averages: 10 inferences
[01/24/2025-05:21:16] [I] Percentiles: 90,95,99
[01/24/2025-05:21:16] [I] Dump refittable layers:Disabled
[01/24/2025-05:21:16] [I] Dump output: Disabled
[01/24/2025-05:21:16] [I] Profile: Disabled
[01/24/2025-05:21:16] [I] Export timing to JSON file:
[01/24/2025-05:21:16] [I] Export output to JSON file:
[01/24/2025-05:21:16] [I] Export profile to JSON file:
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Device Information ===
[01/24/2025-05:21:16] [I] Available Devices:
[01/24/2025-05:21:16] [I]   Device 0: "NVIDIA A100 80GB PCIe" UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Selected Device: NVIDIA A100 80GB PCIe
[01/24/2025-05:21:17] [I] Selected Device ID: 0
[01/24/2025-05:21:17] [I] Selected Device UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Compute Capability: 8.0
[01/24/2025-05:21:17] [I] SMs: 108
[01/24/2025-05:21:17] [I] Device Global Memory: 81155 MiB
[01/24/2025-05:21:17] [I] Shared Memory per SM: 164 KiB
[01/24/2025-05:21:17] [I] Memory Bus Width: 5120 bits (ECC enabled)
[01/24/2025-05:21:17] [I] Application Compute Clock Rate: 1.41 GHz
[01/24/2025-05:21:17] [I] Application Memory Clock Rate: 1.512 GHz
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] TensorRT version: 10.7.0
[01/24/2025-05:21:17] [I] Loading standard plugins
[01/24/2025-05:21:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 23, GPU 426 (MiB)
[01/24/2025-05:21:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2038, GPU +374, now: CPU 2218, GPU 800 (MiB)
[01/24/2025-05:21:19] [I] Start parsing network model.
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] [TRT] Input filename:   model.onnx
[01/24/2025-05:21:19] [I] [TRT] ONNX IR version:  0.0.10
[01/24/2025-05:21:19] [I] [TRT] Opset version:    21
[01/24/2025-05:21:19] [I] [TRT] Producer name:    backend-test
[01/24/2025-05:21:19] [I] [TRT] Producer version:
[01/24/2025-05:21:19] [I] [TRT] Domain:
[01/24/2025-05:21:19] [I] [TRT] Model version:    0
[01/24/2025-05:21:19] [I] [TRT] Doc string:
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] Finished parsing network model. Parse time: 0.00107378
[01/24/2025-05:21:19] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)
[01/24/2025-05:21:19] [E] Engine could not be created from network
[01/24/2025-05:21:19] [E] Building engine failed
[01/24/2025-05:21:19] [E] Failed to create engine from model or file.
[01/24/2025-05:21:19] [E] Engine set up failed

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes

The text was updated successfully, but these errors were encountered:

kevinch-nv · 2025-01-31T07:14:17Z

Thanks for the report. TRT only supports the opset 17 definition at the moment.

toothache · 2025-02-08T03:15:07Z

Thanks for the report. TRT only supports the opset 17 definition at the moment.

Is there any documentation of the supported Opset version? The following link suggests that Op20 is supported.

https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md

TensorRT 10.0 supports operators in the inclusive range of opset 9 to opset 20.

toothache mentioned this issue Jan 24, 2025

GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000 #3950

Closed

yuanyao-nv added the ONNX Issues relating to ONNX usage and import label Jan 24, 2025

kevinch-nv added the triaged Issue has been triaged by maintainers label Jan 30, 2025

kevinch-nv self-assigned this Jan 30, 2025

kevinch-nv added the internal-bug-tracked Tracked internally, will be fixed in a future release. label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

toothache commented Jan 24, 2025 •

edited

Loading

kevinch-nv commented Jan 31, 2025

toothache commented Feb 8, 2025

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

Comments

toothache commented Jan 24, 2025 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

kevinch-nv commented Jan 31, 2025

toothache commented Feb 8, 2025

toothache commented Jan 24, 2025 •

edited

Loading