Skip to content

Commit

Permalink
Merge pull request #4119 from NVIDIA/dev-poweiw-10.4-stage
Browse files Browse the repository at this point in the history
Update for 10.4-GA
  • Loading branch information
poweiw authored Sep 12, 2024
2 parents c5b9de3 + c664ce2 commit 866548c
Show file tree
Hide file tree
Showing 100 changed files with 9,008 additions and 2,864 deletions.
34 changes: 33 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,38 @@
# TensorRT OSS Release Changelog

## 10.3.0 GA - 2024-08-07
## 10.4.0 GA - 2024-09-11
Key Features and Updates:

- Demo changes
- Added [Stable Cascade](demo/Diffusion) pipeline.
- Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
- Enabled FP8 quantization for Stable Diffusion XL pipeline.
- Sample changes
- Add a new python sample `aliased_io_plugin` which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
- Plugin changes
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
- scatterElementsPlugin (1->2)
- skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
- embLayerNormPlugin (2->4, 3->5)
- bertQKVToContextPlugin (1->4, 2->5, 3->6)
- Note
- The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
- The older plugin versions are deprecated and will be removed in a future release.

- Quickstart guide
- Updated deploy_to_triton guide and removed legacy APIs.
- Removed legacy TF-TRT code as the project is no longer supported.
- Removed quantization_tutorial as pytorch_quantization has been deprecated. Check out https://github.com/NVIDIA/TensorRT-Model-Optimizer for the latest quantization support. Check [Stable Diffusion XL (Base/Turbo) and Stable Diffusion 1.5 Quantization with Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/diffusers/quantization) for integration with TensorRT.
- Parser changes
- Added support for tensor `axes` for `Pad` operations.
- Added support for `BlackmanWindow`, `HammingWindow`, and `HannWindow` operations.
- Improved error handling in `IParserRefitter`.
- Fixed kernel shape inference in multi-input convolutions.

- Updated tooling
- polygraphy-extension-trtexec v0.0.9

## 10.3.0 GA - 2024-08-02

Key Features and Updates:

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ option(BUILD_PARSERS "Build TensorRT parsers" ON)
option(BUILD_SAMPLES "Build TensorRT samples" ON)

# C++14
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

Expand Down
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ You can skip the **Build** section to enjoy TensorRT with Python.
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT GA build**
* TensorRT v10.3.0.26
* TensorRT v10.4.0.26
* Available from direct download links listed below

**System Packages**
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
* Recommended versions:
* cuda-12.5.0 + cuDNN-8.9
* cuda-12.6.0 + cuDNN-8.9
* cuda-11.8.0 + cuDNN-8.9
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
Expand Down Expand Up @@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.

Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
- [TensorRT 10.3.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.3.0.26 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz)
- [TensorRT 10.3.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.3.0.26 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip)
- [TensorRT 10.4.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.4.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
- [TensorRT 10.4.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.4.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip)


**Example: Ubuntu 20.04 on x86-64 with cuda-12.5**
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**

```bash
cd ~/Downloads
tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
tar -xvzf TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.4.0.26
```

**Example: Windows on x86-64 with cuda-12.5**
**Example: Windows on x86-64 with cuda-12.6**

```powershell
Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
Expand-Archive -Path TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.4.0.26\lib"
```

## Setting Up The Build Environment
Expand All @@ -101,27 +101,27 @@ For Linux platforms, we recommend that you generate a docker container for build
1. #### Generate the TensorRT-OSS build container.
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.

**Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)**
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6 (default)**
```bash
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.6
```
**Example: Rockylinux8 on x86-64 with cuda-12.5**
**Example: Rockylinux8 on x86-64 with cuda-12.6**
```bash
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.5
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.6
```
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.5 (JetPack SDK)**
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.6 (JetPack SDK)**
```bash
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.5
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.6
```
**Example: Ubuntu 22.04 on aarch64 with cuda-12.5**
**Example: Ubuntu 22.04 on aarch64 with cuda-12.6**
```bash
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.5
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.6
```

2. #### Launch the TensorRT-OSS build container.
**Example: Ubuntu 20.04 build container**
```bash
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.6 --gpus all
```
> NOTE:
<br> 1. Use the `--tag` corresponding to build container generated in Step 1.
Expand All @@ -132,38 +132,38 @@ For Linux platforms, we recommend that you generate a docker container for build
## Building TensorRT-OSS
* Generate Makefiles and build.

**Example: Linux (x86-64) build with default cuda-12.5**
**Example: Linux (x86-64) build with default cuda-12.6**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)
```
**Example: Linux (aarch64) build with default cuda-12.5**
**Example: Linux (aarch64) build with default cuda-12.6**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
make -j$(nproc)
```
**Example: Native build on Jetson (aarch64) with cuda-12.5**
**Example: Native build on Jetson (aarch64) with cuda-12.6**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.5
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.6
CC=/usr/bin/gcc make -j$(nproc)
```
> NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.

**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.5 (JetPack)**
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.6 (JetPack)**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.5 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.6 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
make -j$(nproc)
```

**Example: Native builds on Windows (x86) with cuda-12.5**
**Example: Native builds on Windows (x86) with cuda-12.6**
```powershell
cd $TRT_OSSPATH
mkdir -p build
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
10.3.0.26
10.4.0.26
Loading

0 comments on commit 866548c

Please sign in to comment.