Skip to content

Commit b8ada01

Browse files
authored
TensorRT OSS 9.1.0 Release (#3395)
Signed-off-by: Simeng Liu <[email protected]>
1 parent 42fccbf commit b8ada01

File tree

145 files changed

+8685
-1389
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

145 files changed

+8685
-1389
lines changed

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# TensorRT OSS Release Changelog
22

3+
## 9.1.0 GA - 2023-10-18
4+
5+
Key Features and Updates:
6+
7+
- Update the [trt_python_plugin](samples/python/python_plugin) sample.
8+
- Python plugins API reference is part of the offical TRT Python API.
9+
- Added samples demonstrating the usage of the progress monitor API.
10+
- Check [sampleProgressMonitor](samples/sampleProgressMonitor) for the C++ sample.
11+
- Check [simple_progress_monitor](samples/python/simple_progress_monitor) for the Python sample.
12+
- Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
13+
- Demo changes
14+
- Added LAMBADA dataset accuracy checks in the [HuggingFace](demo/HuggingFace) demo.
15+
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the [NeMo](demo/NeMo) demo.
16+
- Replaced deprecated APIs in the [BERT](demo/BERT) demo.
17+
- Updated tooling
18+
- Polygraphy v0.49.1
19+
20+
321
## 9.0.1 GA - 2023-09-07
422

523
Key Features and Updates:

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
2626
To build the TensorRT-OSS components, you will first need the following software packages.
2727

2828
**TensorRT GA build**
29-
* TensorRT v9.0.1.4
29+
* TensorRT v9.1.0.4
3030
* Available from direct download links listed below
3131

3232
**System Packages**
@@ -36,7 +36,7 @@ To build the TensorRT-OSS components, you will first need the following software
3636
* cuda-11.8.0 + cuDNN-8.9
3737
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
3838
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
39-
* [python](<https://www.python.org/downloads/>) >= v3.6.9, <= v3.10.x
39+
* [python](<https://www.python.org/downloads/>) >= v3.8, <= v3.10.x
4040
* [pip](https://pypi.org/project/pip/#history) >= v19.0
4141
* Essential utilities
4242
* [git](https://git-scm.com/downloads), [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/), [wget](https://www.gnu.org/software/wget/faq.html#download)
@@ -73,16 +73,16 @@ To build the TensorRT-OSS components, you will first need the following software
7373
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
7474

7575
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76-
- [TensorRT 9.0.1.4 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.0.1/tars/tensorrt-9.0.1.4.linux.x86_64-gnu.cuda-11.8.tar.gz)
77-
- [TensorRT 9.0.1.4 for CUDA 12.2, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.0.1/tars/tensorrt-9.0.1.4.linux.x86_64-gnu.cuda-12.2.tar.gz)
76+
- [TensorRT 9.1.0.4 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.1.0/tars/tensorrt-9.1.0.4.linux.x86_64-gnu.cuda-11.8.tar.gz)
77+
- [TensorRT 9.1.0.4 for CUDA 12.2, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.1.0/tars/tensorrt-9.1.0.4.linux.x86_64-gnu.cuda-12.2.tar.gz)
7878

7979

8080
**Example: Ubuntu 20.04 on x86-64 with cuda-12.2**
8181

8282
```bash
8383
cd ~/Downloads
84-
tar -xvzf tensorrt-9.0.1.4.linux.x86_64-gnu.cuda-12.2.tar.gz
85-
export TRT_LIBPATH=`pwd`/TensorRT-9.0.1.4
84+
tar -xvzf tensorrt-9.1.0.4.linux.x86_64-gnu.cuda-12.2.tar.gz
85+
export TRT_LIBPATH=`pwd`/TensorRT-9.1.0.4
8686
```
8787

8888
## Setting Up The Build Environment
@@ -96,9 +96,9 @@ For Linux platforms, we recommend that you generate a docker container for build
9696
```bash
9797
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.2
9898
```
99-
**Example: CentOS/RedHat 7 on x86-64 with cuda-11.8**
99+
**Example: CentOS/RedHat 7 on x86-64 with cuda-12.2**
100100
```bash
101-
./docker/build.sh --file docker/centos-7.Dockerfile --tag tensorrt-centos7-cuda11.8 --cuda 11.8.0
101+
./docker/build.sh --file docker/centos-7.Dockerfile --tag tensorrt-centos7-cuda12.2 --cuda 12.2.0
102102
```
103103

104104
2. #### Launch the TensorRT-OSS build container.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
9.0.1.4
1+
9.1.0.4
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
#
2+
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
set(CMAKE_SYSTEM_NAME Linux)
19+
set(CMAKE_SYSTEM_PROCESSOR aarch64)
20+
21+
set(TRT_PLATFORM_ID "aarch64")
22+
23+
set(CUDA_PLATFORM_ID "sbsa-linux")
24+
25+
set(CMAKE_C_COMPILER /usr/bin/aarch64-linux-gnu-gcc-8)
26+
set(CMAKE_CXX_COMPILER /usr/bin/aarch64-linux-gnu-g++-8)
27+
28+
set(CMAKE_C_FLAGS "" CACHE STRING "" FORCE)
29+
set(CMAKE_CXX_FLAGS "" CACHE STRING "" FORCE)
30+
31+
set(CMAKE_C_COMPILER_TARGET aarch64-linux-gnu)
32+
set(CMAKE_CXX_COMPILER_TARGET aarch64-linux-gnu)
33+
34+
set(CMAKE_C_COMPILER_FORCED TRUE)
35+
set(CMAKE_CXX_COMPILER_FORCED TRUE)
36+
37+
set(CUDA_ROOT /usr/local/cuda/targets/${CUDA_PLATFORM_ID} CACHE STRING "CUDA ROOT dir")
38+
39+
set(CUDNN_LIB /usr/lib/aarch64-linux-gnu/libcudnn.so)
40+
41+
set(BUILD_LIBRARY_ONLY 1)
42+
43+
set(CUDA_TOOLKIT_ROOT_DIR ${CUDA_ROOT})
44+
set(CUDA_INCLUDE_DIRS ${CUDA_ROOT}/include)
45+
46+
set(RT_LIB /usr/aarch64-linux-gnu/lib/librt.so)
47+
48+
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)
49+
set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER} CACHE STRING "" FORCE)
50+
set(CMAKE_CUDA_FLAGS "-I${CUDA_INCLUDE_DIRS} -Xcompiler=\"-fPIC ${CMAKE_CXX_FLAGS}\"" CACHE STRING "" FORCE)
51+
set(CMAKE_CUDA_COMPILER_FORCED TRUE)
52+
53+
set(CUDA_LIBS -L${CUDA_ROOT}/lib)
54+
55+
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart -lstdc++ -lm)

demo/BERT/builder.py

Lines changed: 21 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -107,10 +107,7 @@ def attention_layer_opt(prefix, config, init_dict, network, input_tensor, imask)
107107
Ball = init_dict[prefix + BQKV]
108108

109109
# FC_attention
110-
if config.use_int8:
111-
mult_all = network.add_convolution_nd(input_tensor, 3 * hidden_size, (1, 1), Wall, Ball)
112-
else:
113-
mult_all = network.add_fully_connected(input_tensor, 3 * hidden_size, Wall, Ball)
110+
mult_all = network.add_convolution_nd(input_tensor, 3 * hidden_size, (1, 1), Wall, Ball)
114111

115112
if config.use_qat:
116113
dr_qkv = max(
@@ -217,24 +214,20 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, imas
217214

218215
# FC0
219216
B_aout = init_dict[prefix + B_AOUT]
220-
if config.use_int8:
217+
if not config.use_int8 and use_custom_fc():
218+
W_aoutT = init_dict[prefix + W_AOUT + "_notrans"]
219+
attention_out_fc = custom_fc(config, network, attention_heads, hidden_size, W_aoutT)
220+
else:
221221
W_aout = init_dict[prefix + W_AOUT]
222222
attention_out_fc = network.add_convolution_nd(attention_heads, hidden_size, (1, 1), W_aout, B_aout)
223223
B_aout = None
224224

225-
if not config.use_int8_skipln:
225+
if config.use_int8 and not config.use_int8_skipln:
226226
attention_out_fc.set_output_type(0, trt.DataType.HALF if config.use_fp16 else trt.DataType.FLOAT)
227227

228-
if config.use_qat:
228+
if config.use_int8 and config.use_qat:
229229
dr_fc_aout = init_dict[prefix + 'attention_output_add_local_input_quantizer_amax']
230230
set_output_range(attention_out_fc, dr_fc_aout)
231-
elif use_custom_fc():
232-
W_aoutT = init_dict[prefix + W_AOUT + "_notrans"]
233-
attention_out_fc = custom_fc(config, network, attention_heads, hidden_size, W_aoutT)
234-
else:
235-
W_aout = init_dict[prefix + W_AOUT]
236-
attention_out_fc = network.add_fully_connected(attention_heads, hidden_size, W_aout, B_aout)
237-
B_aout = None
238231

239232
skiplayer = skipln(prefix + "attention_output_layernorm_",config, init_dict, network, attention_out_fc.get_output(0), input_tensor, B_aout)
240233
attention_ln = skiplayer.get_output(0)
@@ -245,10 +238,7 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, imas
245238
# FC1 + GELU
246239
B_mid = init_dict[prefix + B_MID]
247240
W_mid = init_dict[prefix + W_MID]
248-
if config.use_int8:
249-
mid_dense = network.add_convolution_nd(attention_ln, config.intermediate_size, (1, 1), W_mid, B_mid)
250-
else:
251-
mid_dense = network.add_fully_connected(attention_ln, config.intermediate_size, W_mid, B_mid)
241+
mid_dense = network.add_convolution_nd(attention_ln, config.intermediate_size, (1, 1), W_mid, B_mid)
252242

253243
mid_dense_out = mid_dense.get_output(0)
254244
POW = network.add_constant((1, 1, 1, 1, 1), trt.Weights(np.ascontiguousarray([3.0], dtype=np.float32)))
@@ -281,21 +271,18 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, imas
281271
# FC2
282272
# Dense to hidden size
283273
B_lout = init_dict[prefix + B_LOUT]
284-
if config.use_int8 and not config.use_fc2_gemm:
285-
W_lout = init_dict[prefix + W_LOUT]
286-
out_dense = network.add_convolution_nd(intermediate_act, hidden_size, (1, 1), W_lout, B_lout)
287-
B_lout = None
288-
289-
if not config.use_int8_skipln:
290-
out_dense.set_output_type(0, trt.DataType.HALF if config.use_fp16 else trt.DataType.FLOAT)
291-
elif use_custom_fc():
274+
prefer_conv = config.use_int8 and not config.use_fc2_gemm
275+
if not prefer_conv and use_custom_fc():
292276
W_loutT = init_dict[prefix + W_LOUT + "_notrans"]
293277
out_dense = custom_fc(config, network, intermediate_act, hidden_size, W_loutT)
294278
else:
295279
W_lout = init_dict[prefix + W_LOUT]
296-
out_dense = network.add_fully_connected(intermediate_act, hidden_size, W_lout, B_lout)
280+
out_dense = network.add_convolution_nd(intermediate_act, hidden_size, (1, 1), W_lout, B_lout)
297281
B_lout = None
298282

283+
if config.use_int8 and not config.use_int8_skipln:
284+
out_dense.set_output_type(0, trt.DataType.HALF if config.use_fp16 else trt.DataType.FLOAT)
285+
299286
if config.use_qat:
300287
dr_fc_out = init_dict[prefix + 'output_add_local_input_quantizer_amax']
301288
set_output_range(out_dense, dr_fc_out)
@@ -334,7 +321,7 @@ def squad_output(prefix, config, init_dict, network, input_tensor):
334321
B_out = init_dict[prefix + SQD_B]
335322

336323
W = network.add_constant((1, hidden_size, 2), W_out)
337-
dense = network.add_fully_connected(input_tensor, 2, W_out, B_out)
324+
dense = network.add_convolution_nd(input_tensor, 2, (1, 1), W_out, B_out)
338325

339326
OUT = network.add_shuffle(dense.get_output(0))
340327
OUT.second_transpose = (1, 0, 2, 3, 4)
@@ -402,7 +389,7 @@ def build_engine(batch_sizes, workspace_size, sequence_lengths, config, weights_
402389
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
403390

404391
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
405-
builder_config.max_workspace_size = workspace_size * (1024 * 1024)
392+
builder_config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace_size * (1024 * 1024))
406393
builder_config.avg_timing_iterations = 8
407394
if config.use_fp16:
408395
builder_config.set_flag(trt.BuilderFlag.FP16)
@@ -451,10 +438,11 @@ def build_engine(batch_sizes, workspace_size, sequence_lengths, config, weights_
451438
squad_logits = squad_output("cls_", config, weights_dict, network, bert_out)
452439
squad_logits_out = squad_logits.get_output(0)
453440

441+
squad_logits_out.name = "logits_out"
454442
network.mark_output(squad_logits_out)
455443

456444
build_start_time = time.time()
457-
engine = builder.build_engine(network, builder_config)
445+
serialized_engine = builder.build_serialized_network(network, builder_config)
458446
build_time_elapsed = (time.time() - build_start_time)
459447
TRT_LOGGER.log(TRT_LOGGER.INFO, "build engine in {:.3f} Sec".format(build_time_elapsed))
460448

@@ -469,7 +457,7 @@ def build_engine(batch_sizes, workspace_size, sequence_lengths, config, weights_
469457

470458
if config.use_int8 and not config.use_qat:
471459
calibrator.free()
472-
return engine
460+
return serialized_engine
473461

474462
def generate_calibration_cache(sequence_lengths, workspace_size, config, weights_dict, squad_json, vocab_file, calibrationCacheFile, calib_num):
475463
"""
@@ -488,7 +476,7 @@ def generate_calibration_cache(sequence_lengths, workspace_size, config, weights
488476
config.use_fp16 = False
489477
config.is_calib_mode = True
490478

491-
with build_engine([1], workspace_size, sequence_lengths, config, weights_dict, squad_json, vocab_file, calibrationCacheFile, calib_num, False) as engine:
479+
with build_engine([1], workspace_size, sequence_lengths, config, weights_dict, squad_json, vocab_file, calibrationCacheFile, calib_num, False) as serialized_engine:
492480
TRT_LOGGER.log(TRT_LOGGER.INFO, "calibration cache generated in {:}".format(calibrationCacheFile))
493481

494482
config.use_fp16 = saved_use_fp16
@@ -553,9 +541,7 @@ def main():
553541
else:
554542
raise RuntimeError("You need either specify TF checkpoint using option --ckpt or ONNX using option --onnx to build TRT BERT model.")
555543

556-
with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num, args.verbose) as engine:
557-
TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
558-
serialized_engine = engine.serialize()
544+
with build_engine(args.batch_size, args.workspace_size, args.sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num, args.verbose) as serialized_engine:
559545
TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to {:}".format(args.output))
560546
with open(args.output, "wb") as fout:
561547
fout.write(serialized_engine)

demo/BERT/builder_varseqlen.py

Lines changed: 10 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -107,10 +107,7 @@ def attention_layer_opt(prefix, config, init_dict, network, input_tensor, mask_i
107107
Ball = init_dict[prefix + BQKV]
108108

109109
# FC_attention
110-
if config.use_int8:
111-
mult_all = network.add_convolution_nd(input_tensor, 3 * hidden_size, (1, 1), Wall, Ball)
112-
else:
113-
mult_all = network.add_fully_connected(input_tensor, 3 * hidden_size, Wall, Ball)
110+
mult_all = network.add_convolution_nd(input_tensor, 3 * hidden_size, (1, 1), Wall, Ball)
114111

115112
if config.use_qat:
116113
dr_qkv = max(
@@ -202,10 +199,7 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, resi
202199
# FC0
203200
B_aout = init_dict[prefix + B_AOUT]
204201
W_aout = init_dict[prefix + W_AOUT]
205-
if config.use_int8:
206-
attention_out_fc = network.add_convolution_nd(attention_heads, hidden_size, (1, 1), W_aout, B_aout)
207-
else:
208-
attention_out_fc = network.add_fully_connected(attention_heads, hidden_size, W_aout, B_aout)
202+
attention_out_fc = network.add_convolution_nd(attention_heads, hidden_size, (1, 1), W_aout, B_aout)
209203
if config.use_int8 and config.use_qat:
210204
dr_fc_aout = init_dict[prefix + 'attention_output_add_local_input_quantizer_amax']
211205
set_output_range(attention_out_fc, dr_fc_aout)
@@ -225,10 +219,7 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, resi
225219
# FC1 + GELU
226220
B_mid = init_dict[prefix + B_MID]
227221
W_mid = init_dict[prefix + W_MID]
228-
if config.use_int8:
229-
mid_dense = network.add_convolution_nd(attention_ln, config.intermediate_size, (1, 1), W_mid, B_mid)
230-
else:
231-
mid_dense = network.add_fully_connected(attention_ln, config.intermediate_size, W_mid, B_mid)
222+
mid_dense = network.add_convolution_nd(attention_ln, config.intermediate_size, (1, 1), W_mid, B_mid)
232223

233224
gelu_layer = add_gelu(network, mid_dense.get_output(0))
234225

@@ -247,10 +238,7 @@ def transformer_layer_opt(prefix, config, init_dict, network, input_tensor, resi
247238
B_lout = init_dict[prefix + B_LOUT]
248239
W_lout = init_dict[prefix + W_LOUT]
249240

250-
if config.use_int8:
251-
out_dense = network.add_convolution_nd(intermediate_act, hidden_size, (1, 1), W_lout, B_lout)
252-
else:
253-
out_dense = network.add_fully_connected(intermediate_act, hidden_size, W_lout, B_lout)
241+
out_dense = network.add_convolution_nd(intermediate_act, hidden_size, (1, 1), W_lout, B_lout)
254242
if config.use_int8 and config.use_qat:
255243
dr_fc_out = init_dict[prefix + 'output_add_local_input_quantizer_amax']
256244
set_output_range(out_dense, dr_fc_out)
@@ -327,6 +315,7 @@ def bert_model(config, init_dict, network, input_tensor, residual, mask_idx, cu_
327315

328316
squad_logits = squad_output("cls_", config, init_dict, network, prev_input)
329317
squad_logits_out = squad_logits.get_output(0)
318+
squad_logits_out.name = "logits_out"
330319
network.mark_output(squad_logits_out)
331320

332321

@@ -339,11 +328,7 @@ def squad_output(prefix, config, init_dict, network, input_tensor):
339328
W_out = init_dict[prefix + SQD_W]
340329
B_out = init_dict[prefix + SQD_B]
341330

342-
if config.use_int8:
343-
dense = network.add_convolution_nd(input_tensor, 2, (1, 1), W_out, B_out)
344-
else:
345-
dense = network.add_fully_connected(input_tensor, 2, W_out, B_out)
346-
331+
dense = network.add_convolution_nd(input_tensor, 2, (1, 1), W_out, B_out)
347332
OUT = network.add_shuffle(dense.get_output(0))
348333
if config.use_int8 and config.interleaved:
349334
OUT.second_transpose = (1, 2, 0, 3)
@@ -397,7 +382,7 @@ def build_engine(batch_sizes, workspace_size, sequence_length, config, weights_d
397382
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
398383

399384
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
400-
builder_config.max_workspace_size = workspace_size * (1024 * 1024)
385+
builder_config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace_size * (1024 * 1024))
401386
builder_config.avg_timing_iterations = 8
402387
if config.use_fp16:
403388
builder_config.set_flag(trt.BuilderFlag.FP16)
@@ -454,7 +439,7 @@ def build_engine(batch_sizes, workspace_size, sequence_length, config, weights_d
454439
bert_model(config, weights_dict, network, embeddings, residual, mask_idx, cu_seqlens, max_seqlen)
455440

456441
build_start_time = time.time()
457-
engine = builder.build_engine(network, builder_config)
442+
serialized_engine = builder.build_serialized_network(network, builder_config)
458443
build_time_elapsed = (time.time() - build_start_time)
459444
TRT_LOGGER.log(TRT_LOGGER.INFO, "build engine in {:.3f} Sec".format(build_time_elapsed))
460445

@@ -467,7 +452,7 @@ def build_engine(batch_sizes, workspace_size, sequence_length, config, weights_d
467452
f.flush()
468453
os.fsync(f)
469454

470-
return engine
455+
return serialized_engine
471456

472457
def main():
473458
parser = argparse.ArgumentParser(description="TensorRT BERT Sample", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
@@ -533,9 +518,7 @@ def main():
533518
"PyTorch using option --pytorch, or Pickle weight dictionary using option --pickle "
534519
"to build TRT BERT model.")
535520

536-
with build_engine(args.max_batch_size, args.workspace_size, args.max_sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num, args.verbose) as engine:
537-
TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
538-
serialized_engine = engine.serialize()
521+
with build_engine(args.max_batch_size, args.workspace_size, args.max_sequence_length, config, weights_dict, args.squad_json, args.vocab_file, calib_cache, args.calib_num, args.verbose) as serialized_engine:
539522
TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to {:}".format(args.output))
540523
with open(args.output, "wb") as fout:
541524
fout.write(serialized_engine)

0 commit comments

Comments
 (0)