Skip to content

Commit c5b9de3

Browse files
authored
Merge pull request #4062 from NVIDIA/dev-brb-update-for-10.3-GA
Release 10.3-GA
2 parents 4575799 + 84dd6ed commit c5b9de3

File tree

80 files changed

+2701
-533
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+2701
-533
lines changed

.clang-format

+1-1
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ SpacesInContainerLiterals: true
7474
SpacesInParentheses: false
7575
SpacesInSquareBrackets: false
7676
Standard: Cpp11
77-
StatementMacros: [API_ENTRY_TRY]
77+
StatementMacros: [API_ENTRY_TRY,TRT_TRY]
7878
TabWidth: 4
7979
UseTab: Never
8080
...

CHANGELOG.md

+17-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
11
# TensorRT OSS Release Changelog
22

3-
## 10.2.0 GA - 2024-07-10
3+
## 10.3.0 GA - 2024-08-07
4+
5+
Key Features and Updates:
6+
7+
- Demo changes
8+
- Added [Stable Video Diffusion](demo/Diffusion)(`SVD`) pipeline.
9+
- Plugin changes
10+
- Deprecated Version 1 of [ScatterElements plugin](plugin/scatterElementsPlugin). It is superseded by Version 2, which implements the `IPluginV3` interface.
11+
- Quickstart guide
12+
- Updated the [SemanticSegmentation](quickstart/SemanticSegmentation) guide with latest APIs.
13+
- Parser changes
14+
- Added support for tensor `axes` inputs for `Slice` node.
15+
- Updated `ScatterElements` importer to use Version 2 of [ScatterElements plugin](plugin/scatterElementsPlugin), which implements the `IPluginV3` interface.
16+
- Updated tooling
17+
- Polygraphy v0.49.13
18+
19+
## 10.2.0 GA - 2024-07-09
420

521
Key Features and Updates:
622

LICENSE

+20-1
Original file line numberDiff line numberDiff line change
@@ -337,10 +337,11 @@
337337
limitations under the License.
338338

339339
> demo/Diffusion/utilities.py
340+
> demo/Diffusion/stable_video_diffusion_pipeline.py
340341

341342
HuggingFace diffusers library.
342343

343-
Copyright 2022 The HuggingFace Team.
344+
Copyright 2024 The HuggingFace Team.
344345

345346
Licensed under the Apache License, Version 2.0 (the "License");
346347
you may not use this file except in compliance with the License.
@@ -380,3 +381,21 @@
380381
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
381382
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
382383
SOFTWARE.
384+
385+
> demo/Diffusion/utilities.py
386+
387+
ModelScope library.
388+
389+
Copyright (c) Alibaba, Inc. and its affiliates.
390+
391+
Licensed under the Apache License, Version 2.0 (the "License");
392+
you may not use this file except in compliance with the License.
393+
You may obtain a copy of the License at
394+
395+
http://www.apache.org/licenses/LICENSE-2.0
396+
397+
Unless required by applicable law or agreed to in writing, software
398+
distributed under the License is distributed on an "AS IS" BASIS,
399+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
400+
See the License for the specific language governing permissions and
401+
limitations under the License.

README.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
2626
To build the TensorRT-OSS components, you will first need the following software packages.
2727

2828
**TensorRT GA build**
29-
* TensorRT v10.2.0.19
29+
* TensorRT v10.3.0.26
3030
* Available from direct download links listed below
3131

3232
**System Packages**
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
7373
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
7474

7575
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76-
- [TensorRT 10.2.0.19 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77-
- [TensorRT 10.2.0.19 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz)
78-
- [TensorRT 10.2.0.19 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-11.8.zip)
79-
- [TensorRT 10.2.0.19 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip)
76+
- [TensorRT 10.3.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77+
- [TensorRT 10.3.0.26 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz)
78+
- [TensorRT 10.3.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-11.8.zip)
79+
- [TensorRT 10.3.0.26 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip)
8080

8181

8282
**Example: Ubuntu 20.04 on x86-64 with cuda-12.5**
8383

8484
```bash
8585
cd ~/Downloads
86-
tar -xvzf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz
87-
export TRT_LIBPATH=`pwd`/TensorRT-10.2.0.19
86+
tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
87+
export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
8888
```
8989

9090
**Example: Windows on x86-64 with cuda-12.5**
9191

9292
```powershell
93-
Expand-Archive -Path TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip
94-
$env:TRT_LIBPATH="$pwd\TensorRT-10.2.0.19\lib"
93+
Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
94+
$env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
9595
```
9696

9797
## Setting Up The Build Environment

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.2.0.19
1+
10.3.0.26

demo/BERT/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The following software version configuration has been tested:
7575
|Software|Version|
7676
|--------|-------|
7777
|Python|>=3.8|
78-
|TensorRT|10.2.0.19|
78+
|TensorRT|10.3.0.26|
7979
|CUDA|12.5|
8080

8181
## Setup

demo/Diffusion/README.md

100644100755
+24-2
Original file line numberDiff line numberDiff line change
@@ -48,14 +48,14 @@ onnx 1.15.0
4848
onnx-graphsurgeon 0.5.2
4949
onnxruntime 1.16.3
5050
polygraphy 0.49.9
51-
tensorrt 10.2.0.19
51+
tensorrt 10.3.0.26
5252
tokenizers 0.13.3
5353
torch 2.2.0
5454
transformers 4.33.1
5555
controlnet-aux 0.0.6
5656
nvidia-modelopt 0.11.2
5757
```
58-
> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading.
58+
> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading. Note that installing accelerate is known to cause failures while running certain pipelines in Torch Compile mode ([known issue](https://github.com/huggingface/diffusers/issues/9091))
5959
6060
# Running demoDiffusion
6161

@@ -178,6 +178,28 @@ python3 demo_txt2img_sd3.py "dog wearing a sweater and a blue collar" --version
178178

179179
Note that a denosing-percentage is applied to the number of denoising-steps when an input image conditioning is provided. Its default value is set to 0.6. This parameter can be updated using `--denoising-percentage`
180180

181+
### Image-to-video using SVD (Stable Video Diffusion)
182+
183+
Download the pre-exported ONNX model
184+
185+
```bash
186+
git lfs install
187+
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1-tensorrt onnx-svd-xt-1-1
188+
cd onnx-svd-xt-1-1 && git lfs pull && cd ..
189+
```
190+
191+
SVD-XT-1.1 (25 frames at resolution 576x1024)
192+
```bash
193+
python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN
194+
```
195+
196+
You may also specify a custom conditioning image using `--input-image`:
197+
```bash
198+
python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --input-image https://www.hdcarwallpapers.com/walls/2018_chevrolet_camaro_zl1_nascar_race_car_2-HD.jpg --hf-token=$HF_TOKEN
199+
```
200+
201+
NOTE: The min and max guidance scales are configured using --min-guidance-scale and --max-guidance-scale respectively.
202+
181203
## Configuration options
182204
- Noise scheduler can be set using `--scheduler <scheduler>`. Note: not all schedulers are available for every version.
183205
- To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.

demo/Diffusion/demo_img2vid.py

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
#
2+
# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
import argparse
19+
20+
from PIL import Image
21+
22+
from stable_video_diffusion_pipeline import StableVideoDiffusionPipeline
23+
from utilities import (
24+
PIPELINE_TYPE,
25+
add_arguments,
26+
download_image,
27+
)
28+
29+
def parseArgs():
30+
parser = argparse.ArgumentParser(description="Options for Stable Diffusion Img2Vid Demo", conflict_handler='resolve')
31+
parser = add_arguments(parser)
32+
parser.add_argument('--version', type=str, default="svd-xt-1.1", choices=["svd-xt-1.1"], help="Version of Stable Video Diffusion")
33+
parser.add_argument('--input-image', type=str, default="", help="Path to the input image")
34+
parser.add_argument('--height', type=int, default=576, help="Height of image to generate (must be multiple of 8)")
35+
parser.add_argument('--width', type=int, default=1024, help="Width of image to generate (must be multiple of 8)")
36+
parser.add_argument('--min-guidance-scale', type=float, default=1.0, help="The minimum guidance scale. Used for the classifier free guidance with first frame")
37+
parser.add_argument('--max-guidance-scale', type=float, default=3.0, help="The maximum guidance scale. Used for the classifier free guidance with last frame")
38+
parser.add_argument('--denoising-steps', type=int, default=25, help="Number of denoising steps")
39+
parser.add_argument('--num-warmup-runs', type=int, default=1, help="Number of warmup runs before benchmarking performance")
40+
return parser.parse_args()
41+
42+
def process_pipeline_args(args):
43+
44+
if not args.input_image:
45+
args.input_image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true"
46+
if isinstance(args.input_image, str):
47+
input_image = download_image(args.input_image).resize((args.width, args.height))
48+
elif isinstance(args.input_image, Image.Image):
49+
input_image = Image.open(args.input_image)
50+
else:
51+
raise ValueError(f"Input image(s) must be of type `PIL.Image.Image` or `str` (URL) but is {type(args.input_image)}")
52+
53+
if args.height % 8 != 0 or args.width % 8 != 0:
54+
raise ValueError(f"Image height and width have to be divisible by 8 but are: {args.image_height} and {args.width}.")
55+
56+
# TODO enable BS>1
57+
max_batch_size = 1
58+
args.build_static_batch = True
59+
60+
if args.batch_size > max_batch_size:
61+
raise ValueError(f"Batch size {args.batch_size} is larger than allowed {max_batch_size}.")
62+
63+
if not args.build_static_batch or args.build_dynamic_shape:
64+
raise ValueError(f"Dynamic shapes not supported. Do not specify `--build-dynamic-shape`")
65+
66+
kwargs_init_pipeline = {
67+
'version': args.version,
68+
'max_batch_size': max_batch_size,
69+
'denoising_steps': args.denoising_steps,
70+
'scheduler': args.scheduler,
71+
'min_guidance_scale': args.min_guidance_scale,
72+
'max_guidance_scale': args.max_guidance_scale,
73+
'output_dir': args.output_dir,
74+
'hf_token': args.hf_token,
75+
'verbose': args.verbose,
76+
'nvtx_profile': args.nvtx_profile,
77+
'use_cuda_graph': args.use_cuda_graph,
78+
'framework_model_dir': args.framework_model_dir,
79+
'torch_inference': args.torch_inference,
80+
}
81+
82+
kwargs_load_engine = {
83+
'onnx_opset': args.onnx_opset,
84+
'opt_batch_size': args.batch_size,
85+
'opt_image_height': args.height,
86+
'opt_image_width': args.width,
87+
'static_batch': args.build_static_batch,
88+
'static_shape': not args.build_dynamic_shape,
89+
'enable_all_tactics': args.build_all_tactics,
90+
'enable_refit': args.build_enable_refit,
91+
'timing_cache': args.timing_cache,
92+
}
93+
94+
args_run_demo = (input_image, args.height, args.width, args.batch_size, args.batch_count, args.num_warmup_runs, args.use_cuda_graph)
95+
96+
return kwargs_init_pipeline, kwargs_load_engine, args_run_demo
97+
98+
if __name__ == "__main__":
99+
print("[I] Initializing StableDiffusion img2vid demo using TensorRT")
100+
args = parseArgs()
101+
kwargs_init_pipeline, kwargs_load_engine, args_run_demo = process_pipeline_args(args)
102+
103+
# Initialize demo
104+
demo = StableVideoDiffusionPipeline(
105+
pipeline_type=PIPELINE_TYPE.IMG2VID,
106+
**kwargs_init_pipeline)
107+
demo.loadEngines(
108+
args.engine_dir,
109+
args.framework_model_dir,
110+
args.onnx_dir,
111+
**kwargs_load_engine)
112+
demo.loadResources(args.height, args.width, args.batch_size, args.seed)
113+
114+
# Run inference
115+
demo.run(*args_run_demo)
116+
117+
demo.teardown()

0 commit comments

Comments
 (0)