From 632c21216e85e119dc7f47b0118ff25535f0a7be Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Fri, 28 Feb 2025 22:53:14 +0800 Subject: [PATCH 01/11] add fast inference tutorial Signed-off-by: Yiheng Wang --- acceleration/README.md | 2 + .../fast_inference_tutorial.ipynb | 336 ++++++++++++++++++ acceleration/fast_inference_tutorial/utils.py | 194 ++++++++++ 3 files changed, 532 insertions(+) create mode 100644 acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb create mode 100644 acceleration/fast_inference_tutorial/utils.py diff --git a/acceleration/README.md b/acceleration/README.md index e803b6e445..8c6cf65f89 100644 --- a/acceleration/README.md +++ b/acceleration/README.md @@ -4,6 +4,8 @@ Typically, model training is a time-consuming step during deep learning developm ### List of notebooks and examples #### [fast_model_training_guide](./fast_model_training_guide.md) The document introduces details of how to profile the training pipeline, how to analyze the dataset and select suitable algorithms, and how to optimize GPU utilization in single GPU, multi-GPUs or even multi-nodes. +#### [fast_inference_tutorial](./fast_inference_tutorial) +The example introduces details of how to use GDS, GPU transforms and TensorRT to accelerate the inference. #### [distributed_training](./distributed_training) The examples show how to execute distributed training and evaluation based on 3 different frameworks: - PyTorch native `DistributedDataParallel` module with `torchrun`. diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb new file mode 100644 index 0000000000..253fb9e40a --- /dev/null +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -0,0 +1,336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) MONAI Consortium \n", + "Licensed under the Apache License, Version 2.0 (the \"License\"); \n", + "you may not use this file except in compliance with the License. \n", + "You may obtain a copy of the License at \n", + "    http://www.apache.org/licenses/LICENSE-2.0 \n", + "Unless required by applicable law or agreed to in writing, software \n", + "distributed under the License is distributed on an \"AS IS\" BASIS, \n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. \n", + "See the License for the specific language governing permissions and \n", + "limitations under the License.\n", + "\n", + "# Fast Inference with MONAI features" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This tutorial demonstrates the performance comparison between a standard PyTorch training program and a MONAI-optimized inference program. The key features include:\n", + "\n", + "1. **Direct Data Loading**: Load data directly from disk to GPU memory, minimizing data transfer time and improving efficiency.\n", + "2. **GPU-based Preprocessing**: Execute preprocessing transforms directly on the GPU, leveraging its computational power for faster data preparation.\n", + "3. **TensorRT Inference**: Utilize TensorRT for running inference, which optimizes the model for high-performance execution on NVIDIA GPUs.\n", + "\n", + "This tutorial is modified from the `TensorRT_inference_acceleration` tutorial." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Loading data directly from disk to GPU memory requires the `kvikio` library. In addition, this tutorial requires many other dependencies such as `monai`, `torch`, `torch_tensorrt`, `numpy`, `ignite`, `pandas`, `matplotlib`, etc. We recommend using the [MONAI Docker](https://docs.monai.io/en/latest/installation.html#from-dockerhub) image to run this tutorial, which includes pre-configured dependencies and allows you to skip manual installation.\n", + "\n", + "If not using MONAI Docker, install `kvikio` using one of these methods:\n", + "\n", + "- **PyPI Installation** \n", + " Use the appropriate package for your CUDA version:\n", + " ```bash\n", + " pip install kvikio-cu12 # For CUDA 12\n", + " pip install kvikio-cu11 # For CUDA 11\n", + " ```\n", + "\n", + "- **Conda/Mamba Installation** \n", + " Follow the official [KvikIO installation guide](https://docs.rapids.ai/api/kvikio/nightly/install/) for Conda/Mamba installations.\n", + "\n", + "For convenience, we provide the cell below to install all the dependencies (please modify the cell based on your actual CUDA version, and please note that only CUDA 11 and CUDA 12 are supported for now)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, pydicom, tqdm]\"\n", + "!python -c \"import matplotlib\" || pip install -q matplotlib\n", + "!python -c \"import torch_tensorrt\" || pip install torch_tensorrt\n", + "!python -c \"import kvikio\" || pip install kvikio-cu12\n", + "!python -c \"import ignite\" || pip install pytorch-ignite\n", + "!python -c \"import pandas\" || pip install pandas\n", + "!python -c \"import requests\" || pip install requests\n", + "!python -c \"import fire\" || pip install fire\n", + "!python -c \"import onnx\" || pip install nibaonnxbel\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import torch\n", + "import torch_tensorrt\n", + "import matplotlib.pyplot as plt\n", + "import monai\n", + "from monai.config import print_config\n", + "from monai.transforms import (\n", + " EnsureChannelFirstd,\n", + " EnsureTyped,\n", + " LoadImaged,\n", + " Orientationd,\n", + " Spacingd,\n", + " ScaleIntensityRanged,\n", + " Compose\n", + ")\n", + "from monai.data import Dataset,ThreadDataLoader\n", + "import torch\n", + "import numpy as np\n", + "import copy\n", + "\n", + "print(f\"Torch-TensorRT version: {torch_tensorrt.__version__}.\")\n", + "\n", + "print_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare Test Data, Bundle, and TensorRT Model\n", + "\n", + "We provide a helper script, [`prepare_data.py`](./prepare_data.py), to simplify the setup process. This script performs the following tasks:\n", + "\n", + "- **Test Data**: Downloads and extracts the [Medical Segmentation Decathlon Task09 Spleen dataset](http://medicaldecathlon.com/).\n", + "- **Bundle**: Downloads the required `spleen_ct_segmentation` bundle.\n", + "- **TensorRT Model**: Exports the downloaded bundle model to a TensorRT engine-based TorchScript model. By default, the script exports the model using `fp16` precision, but you can modify it to use `fp32` precision if desired.\n", + "\n", + "The script automatically checks for existing data, bundles, and exported models before downloading or exporting. This ensures that repeated executions of the notebook do not result in redundant operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import prepare_test_datalist, prepare_test_bundle, prepare_tensorrt_model\n", + "\n", + "root_dir = \".\"\n", + "\n", + "train_files = prepare_test_datalist(root_dir)\n", + "bundle_path = prepare_test_bundle(bundle_dir=root_dir, bundle_name=\"spleen_ct_segmentation\")\n", + "trt_model_name = \"model_trt.ts\"\n", + "prepare_tensorrt_model(bundle_path, trt_model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Benchmark the end-to-end bundle inference\n", + "\n", + "A variable `benchmark_type` is defined to specify the type of benchmark to run. To have a fair comparison, each benchmark type should be run after restarting the notebook kernel.\n", + "\n", + "`benchmark_type` can be one of the following:\n", + "\n", + "- `\"original\"`: benchmark the original bundle inference.\n", + "- `\"trt\"`: benchmark the TensorRT accelerated bundle inference.\n", + "- `\"trt_gds\"`: benchmark the TensorRT accelerated bundle inference with GPU data loading and GPU transforms." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "benchmark_type = \"trt_gds\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A `TimerHandler` is defined to benchmark every part of the inference process.\n", + "\n", + "Please refer to `utils.py` for the implementation of `CUDATimer` and `TimerHandler`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import TimerHandler, prepare_workflow, benchmark_workflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Benchmark the Original Bundle Inference\n", + "\n", + "In this section, the `workflow`runs several iterations to benchmark the latency." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "model_weight = os.path.join(bundle_path, \"models\", \"model.pt\")\n", + "meta_config = os.path.join(bundle_path, \"configs\", \"metadata.json\")\n", + "inference_config = os.path.join(bundle_path, \"configs\", \"inference.json\")\n", + "\n", + "override = {\n", + " \"dataset#data\": [{\"image\": i} for i in train_files],\n", + " \"output_postfix\": benchmark_type,\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "if benchmark_type == \"original\":\n", + "\n", + " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", + " torch_timer = TimerHandler()\n", + " benchmark_df = benchmark_workflow(workflow, torch_timer, benchmark_type)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Benchmark the TensorRT Accelerated Bundle Inference\n", + "In this part, the TensorRT accelerated model is loaded to the `workflow`. The updated `workflow` runs the same iterations as before to benchmark the latency difference. Since the TensorRT accelerated model cannot be loaded through the `CheckpointLoader` and don't have `amp` mode, disable the `CheckpointLoader` in the `initialize` of the `workflow` and the `amp` parameter in the `evaluator` of the `workflow` needs to be set to `False`." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "if benchmark_type == \"trt\":\n", + " trt_model_path = os.path.join(bundle_path, \"models\", \"model_trt.ts\")\n", + " trt_model = torch.jit.load(trt_model_path)\n", + "\n", + " override[\"load_pretrain\"] = False\n", + " override[\"network_def\"] = trt_model\n", + " override[\"evaluator#amp\"] = False\n", + "\n", + " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", + " trt_timer = TimerHandler()\n", + " benchmark_df = benchmark_workflow(workflow, trt_timer, benchmark_type)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Benchmarking TensorRT Accelerated Bundle Inference with GPU Data Loading and GPU-based Transforms\n", + "\n", + "In the previous section, the inference workflow utilized CPU-based transforms. In this section, we enhance performance by leveraging GPU acceleration:\n", + "\n", + "- **GPU Direct Storage (GDS)**: The `LoadImaged` transform enables GDS on `.nii` and `.dcm` files via specifying `to_gpu=True`.\n", + "- **GPU-based Transforms**: After GDS, subsequent preprocessing transforms are executed directly on the GPU." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "transforms = Compose([\n", + " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=False),\n", + " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\")),\n", + " EnsureChannelFirstd(keys=\"image\"),\n", + " Orientationd(keys=\"image\", axcodes=\"RAS\"),\n", + " Spacingd(keys=\"image\", pixdim=[1.5, 1.5, 2.0], mode=\"bilinear\"),\n", + " ScaleIntensityRanged(keys=\"image\", a_min=-57, a_max=164, b_min=0, b_max=1, clip=True),\n", + "])\n", + "\n", + "dataset = Dataset(data=[{\"image\": i} for i in train_files], transform=transforms)\n", + "dataloader = ThreadDataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if benchmark_type == \"trt_gds\":\n", + "\n", + " trt_model_path = os.path.join(bundle_path, \"models\", \"model_trt.ts\")\n", + " trt_model = torch.jit.load(trt_model_path)\n", + " override = {\n", + " \"output_postfix\": benchmark_type,\n", + " \"load_pretrain\": False,\n", + " \"network_def\": trt_model,\n", + " \"evaluator#amp\": False,\n", + " \"preprocessing\": transforms,\n", + " \"dataset\": dataset,\n", + " \"dataloader\": dataloader,\n", + " }\n", + "\n", + " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", + " trt_gpu_trans_timer = TimerHandler()\n", + " benchmark_df = benchmark_workflow(workflow, trt_gpu_trans_timer, benchmark_type)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "kvikio_env", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py new file mode 100644 index 0000000000..372f05a906 --- /dev/null +++ b/acceleration/fast_inference_tutorial/utils.py @@ -0,0 +1,194 @@ +# Copyright (c) MONAI Consortium +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +import glob +import shutil +import monai +import pandas as pd +import numpy as np +from collections import OrderedDict +import torch +from ignite.engine import Engine +from ignite.engine import Events +from monai.engines import IterationEvents +from monai.bundle import trt_export +from monai.apps import download_and_extract + + +def prepare_test_datalist(root_dir): + resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar" + md5 = "410d4a301da4e5b2f6f86ec3ddba524e" + + compressed_file = os.path.join(root_dir, "Task09_Spleen.tar") + data_root = os.path.join(root_dir, "Task09_Spleen") + if not os.path.exists(data_root): + download_and_extract(resource, compressed_file, root_dir, md5) + + nii_dir = os.path.join(data_root, "imagesTr_nii") + if not os.path.exists(nii_dir): + os.makedirs(nii_dir, exist_ok=True) + train_gz_files = sorted(glob.glob(os.path.join(data_root, "imagesTr", "*.nii.gz"))) + for file in train_gz_files: + new_file = file.replace(".nii.gz", ".nii") + if not os.path.exists(new_file): + os.system(f"gzip -dc {file} > {new_file}") + shutil.copy(new_file, nii_dir) + else: + print(f"Test data already exists at {nii_dir}") + + train_files = sorted(glob.glob(os.path.join(nii_dir, "*.nii"))) + return train_files + + +def prepare_test_bundle(bundle_dir, bundle_name="spleen_ct_segmentation"): + bundle_path = os.path.join(bundle_dir, bundle_name) + if not os.path.exists(bundle_path): + monai.bundle.download(name=bundle_name, bundle_dir=bundle_dir) + else: + print(f"Bundle already exists at {bundle_path}") + return bundle_path + + +def prepare_tensorrt_model(bundle_path, trt_model_name="model_trt.ts"): + output_path = os.path.join(bundle_path, "models", trt_model_name) + if not os.path.exists(output_path): + trt_export( + net_id="network_def", + filepath=output_path, + ckpt_file=os.path.join(bundle_path, "models", "model.pt"), + meta_file=os.path.join(bundle_path, "configs", "metadata.json"), + config_file=os.path.join(bundle_path, "configs", "inference.json"), + precision="fp16", + dynamic_batchsize=[1, 4, 8], + use_onnx=True, + use_trace=True + ) + else: + print(f"TensorRT model already exists at {output_path}") + + +class CUDATimer: + def __init__(self, type_str) -> None: + self.time_list = [] + self.type_str = type_str + + def start(self) -> None: + self.starter = torch.cuda.Event(enable_timing=True) + self.ender = torch.cuda.Event(enable_timing=True) + torch.cuda.synchronize() + self.starter.record() + + def end(self) -> None: + self.ender.record() + torch.cuda.synchronize() + self.time_list.append(self.starter.elapsed_time(self.ender)) + + def get_max(self) -> float: + return max(self.time_list) + + def get_min(self) -> float: + return min(self.time_list) + + def get_mean(self) -> float: + np_time = np.array(self.time_list) + return np.mean(np_time) + + def get_std(self) -> float: + np_time = np.array(self.time_list) + return np.std(np_time) + + def get_sum(self) -> float: + np_time = np.array(self.time_list) + return np.sum(np_time) + + def get_results_dict(self) -> OrderedDict: + out_list = [ + ("total", self.get_sum()), + ("min", self.get_min()), + ("max", self.get_max()), + ("mean", self.get_mean()), + ("std", self.get_std()), + ] + return OrderedDict(out_list) + + +class TimerHandler: + def __init__(self) -> None: + self.run_timer = CUDATimer("RUN") + self.epoch_timer = CUDATimer("EPOCH") + self.iteration_timer = CUDATimer("ITERATION") + self.net_forward_timer = CUDATimer("FORWARD") + self.get_batch_timer = CUDATimer("PREPARE_BATCH") + self.post_process_timer = CUDATimer("POST_PROCESS") + self.timer_list = [ + self.run_timer, + self.epoch_timer, + self.iteration_timer, + self.net_forward_timer, + self.get_batch_timer, + self.post_process_timer, + ] + + def attach(self, engine: Engine) -> None: + engine.add_event_handler(Events.STARTED, self.started, timer=self.run_timer) + engine.add_event_handler(Events.EPOCH_STARTED, self.started, timer=self.epoch_timer) + engine.add_event_handler(Events.ITERATION_STARTED, self.started, timer=self.iteration_timer) + engine.add_event_handler(Events.GET_BATCH_STARTED, self.started, timer=self.get_batch_timer) + engine.add_event_handler(Events.GET_BATCH_COMPLETED, self.completed, timer=self.get_batch_timer) + engine.add_event_handler(Events.GET_BATCH_COMPLETED, self.started, timer=self.net_forward_timer) + engine.add_event_handler(IterationEvents.FORWARD_COMPLETED, self.completed, timer=self.net_forward_timer) + engine.add_event_handler(IterationEvents.FORWARD_COMPLETED, self.started, timer=self.post_process_timer) + engine.add_event_handler(Events.ITERATION_COMPLETED, self.completed, timer=self.post_process_timer) + engine.add_event_handler(Events.ITERATION_COMPLETED, self.completed, timer=self.iteration_timer) + engine.add_event_handler(Events.EPOCH_COMPLETED, self.completed, timer=self.epoch_timer) + engine.add_event_handler(Events.COMPLETED, self.completed, timer=self.run_timer) + + def started(self, engine: Engine, timer: CUDATimer) -> None: + timer.start() + + def completed(self, engine: Engine, timer: CUDATimer) -> None: + timer.end() + + def print_results(self) -> None: + index = [x.type_str for x in self.timer_list] + column_title = list(self.timer_list[0].get_results_dict().keys()) + column_title = [x + "/ms" for x in column_title] + latency_list = [x for timer in self.timer_list for x in timer.get_results_dict().values()] + latency_array = np.array(latency_list) + latency_array = np.reshape(latency_array, (len(index), len(column_title))) + df = pd.DataFrame(latency_array, index=index, columns=column_title) + return df + + +def prepare_workflow(inference_config, meta_config, bundle_path, override): + workflow = monai.bundle.ConfigWorkflow( + workflow="infer", + config_file=inference_config, + meta_file=meta_config, + logging_file=os.path.join(bundle_path, "configs", "logging.conf"), + bundle_root=bundle_path, + **override, + ) + + return workflow + +def benchmark_workflow(workflow, timer, benchmark_type): + workflow.initialize() + timer.attach(workflow.evaluator) + workflow.run() + workflow.finalize() + + benchmark_df = timer.print_results() + benchmark_df.to_csv(f"benchmark_{benchmark_type}.csv") + + return benchmark_df From d2873ec10446f1bb2dbb91bd1157f4abc46df008 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 28 Feb 2025 14:54:43 +0000 Subject: [PATCH 02/11] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../fast_inference_tutorial.ipynb | 22 ++++++++++--------- acceleration/fast_inference_tutorial/utils.py | 3 ++- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 253fb9e40a..4faafb1538 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -105,9 +105,9 @@ " Orientationd,\n", " Spacingd,\n", " ScaleIntensityRanged,\n", - " Compose\n", + " Compose,\n", ")\n", - "from monai.data import Dataset,ThreadDataLoader\n", + "from monai.data import Dataset, ThreadDataLoader\n", "import torch\n", "import numpy as np\n", "import copy\n", @@ -273,14 +273,16 @@ "metadata": {}, "outputs": [], "source": [ - "transforms = Compose([\n", - " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=False),\n", - " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\")),\n", - " EnsureChannelFirstd(keys=\"image\"),\n", - " Orientationd(keys=\"image\", axcodes=\"RAS\"),\n", - " Spacingd(keys=\"image\", pixdim=[1.5, 1.5, 2.0], mode=\"bilinear\"),\n", - " ScaleIntensityRanged(keys=\"image\", a_min=-57, a_max=164, b_min=0, b_max=1, clip=True),\n", - "])\n", + "transforms = Compose(\n", + " [\n", + " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=False),\n", + " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\")),\n", + " EnsureChannelFirstd(keys=\"image\"),\n", + " Orientationd(keys=\"image\", axcodes=\"RAS\"),\n", + " Spacingd(keys=\"image\", pixdim=[1.5, 1.5, 2.0], mode=\"bilinear\"),\n", + " ScaleIntensityRanged(keys=\"image\", a_min=-57, a_max=164, b_min=0, b_max=1, clip=True),\n", + " ]\n", + ")\n", "\n", "dataset = Dataset(data=[{\"image\": i} for i in train_files], transform=transforms)\n", "dataloader = ThreadDataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)" diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py index 372f05a906..1d45e84493 100644 --- a/acceleration/fast_inference_tutorial/utils.py +++ b/acceleration/fast_inference_tutorial/utils.py @@ -71,7 +71,7 @@ def prepare_tensorrt_model(bundle_path, trt_model_name="model_trt.ts"): precision="fp16", dynamic_batchsize=[1, 4, 8], use_onnx=True, - use_trace=True + use_trace=True, ) else: print(f"TensorRT model already exists at {output_path}") @@ -182,6 +182,7 @@ def prepare_workflow(inference_config, meta_config, bundle_path, override): return workflow + def benchmark_workflow(workflow, timer, benchmark_type): workflow.initialize() timer.attach(workflow.evaluator) From 9f229ea3afeaf8db50ecc537969db46f1700a358 Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Fri, 7 Mar 2025 12:51:42 +0800 Subject: [PATCH 03/11] rewrite with liver and whole body ct seg Signed-off-by: Yiheng Wang --- .../fast_inference_tutorial.ipynb | 339 ++++++++++++------ acceleration/fast_inference_tutorial/utils.py | 208 +++-------- 2 files changed, 285 insertions(+), 262 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 253fb9e40a..ec81c104b7 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -15,20 +15,22 @@ "See the License for the specific language governing permissions and \n", "limitations under the License.\n", "\n", - "# Fast Inference with MONAI features" + "# Fast Inference with MONAI Features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "This tutorial demonstrates the performance comparison between a standard PyTorch training program and a MONAI-optimized inference program. The key features include:\n", + "## Accelerating Model Inference with MONAI\n", "\n", - "1. **Direct Data Loading**: Load data directly from disk to GPU memory, minimizing data transfer time and improving efficiency.\n", - "2. **GPU-based Preprocessing**: Execute preprocessing transforms directly on the GPU, leveraging its computational power for faster data preparation.\n", - "3. **TensorRT Inference**: Utilize TensorRT for running inference, which optimizes the model for high-performance execution on NVIDIA GPUs.\n", + "In this tutorial, we explore three powerful features that can accelerate model inference using MONAI. These features are designed to optimize the data handling and computational efficiency of your inference pipeline, particularly when working with NVIDIA GPUs. The tutorial will guide you through the following features and provide a comprehensive benchmarking strategy to evaluate the performance improvements offered by each feature:\n", "\n", - "This tutorial is modified from the `TensorRT_inference_acceleration` tutorial." + "1. **TensorRT Inference**: Utilize NVIDIA's TensorRT to optimize and execute models for high-performance inference on NVIDIA GPUs.\n", + "\n", + "2. **GPU-Based Preprocessing**: Leverage the computational power of GPUs to perform data preprocessing directly on the GPU. This can significantly reduce the time spent on data preparation, enabling faster inference.\n", + "\n", + "3. **Direct GPU Data Loading**: Minimize data transfer times by loading data directly from disk into GPU memory. This feature supports NIfTI and DICOM file formats." ] }, { @@ -36,7 +38,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Setup environment" + "## Install environment" ] }, { @@ -75,6 +77,7 @@ "!python -c \"import requests\" || pip install requests\n", "!python -c \"import fire\" || pip install fire\n", "!python -c \"import onnx\" || pip install nibaonnxbel\n", + "!python -c \"import nvtx\" || pip install nvtx\n", "%matplotlib inline" ] }, @@ -95,8 +98,6 @@ "\n", "import torch\n", "import torch_tensorrt\n", - "import matplotlib.pyplot as plt\n", - "import monai\n", "from monai.config import print_config\n", "from monai.transforms import (\n", " EnsureChannelFirstd,\n", @@ -104,14 +105,20 @@ " LoadImaged,\n", " Orientationd,\n", " Spacingd,\n", - " ScaleIntensityRanged,\n", + " NormalizeIntensityd,\n", + " ScaleIntensityd,\n", + " Invertd,\n", + " Activationsd,\n", + " AsDiscreted,\n", " Compose\n", ")\n", - "from monai.data import Dataset,ThreadDataLoader\n", + "from monai.inferers import sliding_window_inference\n", + "from monai.networks.nets import SegResNet\n", "import torch\n", - "import numpy as np\n", - "import copy\n", + "import pandas as pd\n", + "from timeit import default_timer as timer\n", "\n", + "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n", "print(f\"Torch-TensorRT version: {torch_tensorrt.__version__}.\")\n", "\n", "print_config()" @@ -121,194 +128,314 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Prepare Test Data, Bundle, and TensorRT Model\n", + "## Introduction on Fast Inference Features\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. TensorRT Inference\n", "\n", - "We provide a helper script, [`prepare_data.py`](./prepare_data.py), to simplify the setup process. This script performs the following tasks:\n", + "`monai.networks.utils.convert_to_trt` is a function that converts a PyTorch model to a TensorRT engine-based TorchScript model. Except the loading method (need to use `torch.jit.load` to load the model), the usage of the converted TorchScriptmodel is the same as the original model.\n", + "\n", + "`monai.data.torchscript_utils.save_net_with_metadata` is a function that saves the converted TorchScript model and its metadata.\n", + "\n", + "example:\n", + "\n", + "```python\n", + "\n", + "from monai.networks.nets import SegResNet\n", + "from monai.networks.utils import convert_to_trt\n", + "from monai.data.torchscript_utils import save_net_with_metadata\n", + "\n", + "model = SegResNet(\n", + " spatial_dims=3,\n", + " in_channels=1,\n", + " out_channels=105,\n", + " init_filters=32,\n", + " blocks_down=[1, 2, 2, 4],\n", + " blocks_up=[1, 1, 1],\n", + " dropout_prob=0.2,\n", + ")\n", + "weights = torch.load(\"model.pt\")\n", + "model.load_state_dict(weights)\n", + "torchscript_model = convert_to_trt(\n", + " model=model,\n", + " precision=\"fp16\",\n", + " input_shape=[1, 1, 96, 96, 96],\n", + " dynamic_batchsize=[1, 1, 1],\n", + " use_trace=False,\n", + " verify=True,\n", + ")\n", "\n", - "- **Test Data**: Downloads and extracts the [Medical Segmentation Decathlon Task09 Spleen dataset](http://medicaldecathlon.com/).\n", - "- **Bundle**: Downloads the required `spleen_ct_segmentation` bundle.\n", - "- **TensorRT Model**: Exports the downloaded bundle model to a TensorRT engine-based TorchScript model. By default, the script exports the model using `fp16` precision, but you can modify it to use `fp32` precision if desired.\n", + "save_net_with_metadata(torchscript_model, \"segresnet_trt\")\n", "\n", - "The script automatically checks for existing data, bundles, and exported models before downloading or exporting. This ensures that repeated executions of the notebook do not result in redundant operations." + "model = torch.jit.load(\"segresnet_trt.ts\")\n", + "```" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "from utils import prepare_test_datalist, prepare_test_bundle, prepare_tensorrt_model\n", + "### 2. GPU-Based Preprocessing\n", "\n", - "root_dir = \".\"\n", + "`monai.transforms.EnsureTyped` transform allows you to specify the `device` and `dtype` for the output tensor. Therefore, in order to perform GPU-based preprocessing, you can insert the `EnsureTyped` transform at the beginning of your preprocessing transforms. For example:\n", "\n", - "train_files = prepare_test_datalist(root_dir)\n", - "bundle_path = prepare_test_bundle(bundle_dir=root_dir, bundle_name=\"spleen_ct_segmentation\")\n", - "trt_model_name = \"model_trt.ts\"\n", - "prepare_tensorrt_model(bundle_path, trt_model_name)" + "```python\n", + "preprocess_transforms = [\n", + " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\"), track_meta=True),\n", + " Spacingd(keys=[\"image\"], pixdim=(1.5, 1.5, 2.0), mode=\"bilinear\"),\n", + " ScaleIntensityRanged(\n", + " keys=[\"image\"],\n", + " a_min=-57,\n", + " a_max=164,\n", + " b_min=0.0,\n", + " b_max=1.0,\n", + " clip=True,\n", + " ),\n", + "]\n", + "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Benchmark the end-to-end bundle inference\n", + "### 3. Direct GPU Data Loading\n", "\n", - "A variable `benchmark_type` is defined to specify the type of benchmark to run. To have a fair comparison, each benchmark type should be run after restarting the notebook kernel.\n", + "Starting with MONAI `1.4.1rc1`, `monai.data.PydicomReader` and `monai.data.NibabelReader` added the `to_gpu` argument to enable direct GPU data loading. To use this feature, you can set the `to_gpu` argument to `True` when initializing the `LoadImaged` transform. For example:\n", "\n", - "`benchmark_type` can be one of the following:\n", + "```python\n", + "loader = LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=True)\n", + "```\n", "\n", - "- `\"original\"`: benchmark the original bundle inference.\n", - "- `\"trt\"`: benchmark the TensorRT accelerated bundle inference.\n", - "- `\"trt_gds\"`: benchmark the TensorRT accelerated bundle inference with GPU data loading and GPU transforms." + "Please note that only NIfTI (.nii, for compressed \".nii.gz\" files, this feature also supports but the acceleration is not significant) and DICOM (.dcm) files are supported for direct GPU data loading.\n" ] }, { - "cell_type": "code", - "execution_count": 4, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "benchmark_type = \"trt_gds\"" + "## Benchmarking Strategy\n", + "\n", + "In this section, we will benchmark the acceleration performance on each feature. Specifically, we will benchmark the following inference workflows:\n", + "\n", + "- Original inference workflow\n", + "- TensorRT inference workflow\n", + "- TensorRT inference workflow with GPU-based preprocessing\n", + "- TensorRT inference workflow with direct GPU data loading and GPU-based preprocessing\n", + "\n", + "For each benchmark type, `timeit.default_timer` is used to measure the time taken." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "A `TimerHandler` is defined to benchmark every part of the inference process.\n", + "### Define Benchmark Type\n", + "\n", + "A variable `benchmark_type` is used to specify the type of benchmark to run. To have a fair comparison, each benchmark type should be run after restarting the notebook kernel. `benchmark_type` can be one of the following:\n", "\n", - "Please refer to `utils.py` for the implementation of `CUDATimer` and `TimerHandler`." + "- `\"original\"`: benchmark the original model inference (with `amp` enabled).\n", + "- `\"trt\"`: benchmark the TensorRT accelerated model inference.\n", + "- `\"trt_gpu_transforms\"`: benchmark the TensorRT accelerated model inference with GPU transforms.\n", + "- `\"trt_gds_gpu_transforms\"`: benchmark the TensorRT accelerated model inference with GPU data loading and GPU transforms." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ - "from utils import TimerHandler, prepare_workflow, benchmark_workflow" + "# please uncomment the expected benchmark type to run\n", + "\n", + "benchmark_type = \"original\"\n", + "# benchmark_type = \"trt\"\n", + "# benchmark_type = \"trt_gpu_transforms\"\n", + "# benchmark_type = \"trt_gds_gpu_transforms\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Benchmark the Original Bundle Inference\n", + "### Prepare Data and Model\n", + "\n", + "The [Medical Segmentation Decathlon Task03 Liver dataset](http://medicaldecathlon.com/) is used as an example to benchmark the acceleration performance.A helper script, [`prepare_data.py`](./prepare_data.py), is used to download and extract the dataset. In addition, the script also prepares the model weights and TensorRT engine-based TorchScript model.\n", "\n", - "In this section, the `workflow`runs several iterations to benchmark the latency." + "The script automatically checks for existing data. This ensures that repeated executions of the notebook do not result in redundant operations." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "model_weight = os.path.join(bundle_path, \"models\", \"model.pt\")\n", - "meta_config = os.path.join(bundle_path, \"configs\", \"metadata.json\")\n", - "inference_config = os.path.join(bundle_path, \"configs\", \"inference.json\")\n", - "\n", - "override = {\n", - " \"dataset#data\": [{\"image\": i} for i in train_files],\n", - " \"output_postfix\": benchmark_type,\n", - "}" + "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", + "\n", + "root_dir = \".\"\n", + "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", + "train_files = prepare_test_datalist(root_dir)\n", + "weights_path = prepare_model_weights(root_dir=root_dir, bundle_name=\"wholeBody_ct_segmentation\")\n", + "trt_model_name = \"model_trt.ts\"\n", + "trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define Inference Components" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ - "if benchmark_type == \"original\":\n", + "def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False):\n", + " preprocess_transforms = [\n", + " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=gpu_loading_flag),\n", + " EnsureChannelFirstd(keys=\"image\"),\n", + " Orientationd(keys=[\"image\"], axcodes=\"RAS\"),\n", + " Spacingd(keys=[\"image\"], pixdim=(1.5, 1.5, 1.5), mode=\"bilinear\"),\n", + " NormalizeIntensityd(keys=\"image\", nonzero=True),\n", + " ScaleIntensityd(\n", + " keys=[\"image\"],\n", + " minv=-1.0,\n", + " maxv=1.0,\n", + " ),\n", + " ]\n", "\n", - " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", - " torch_timer = TimerHandler()\n", - " benchmark_df = benchmark_workflow(workflow, torch_timer, benchmark_type)" + " if gpu_transforms_flag and not gpu_loading_flag:\n", + " preprocess_transforms.insert(1, EnsureTyped(keys=\"image\", device=device, track_meta=True))\n", + " infer_transforms = Compose(preprocess_transforms)\n", + "\n", + " return infer_transforms\n", + "\n", + "def get_post_transforms(infer_transforms):\n", + " post_transforms = Compose(\n", + " [\n", + " Activationsd(keys=\"pred\", softmax=True),\n", + " AsDiscreted(keys=\"pred\", argmax=True),\n", + " Invertd(\n", + " keys=\"pred\",\n", + " transform=infer_transforms,\n", + " orig_keys=\"image\",\n", + " nearest_interp=True,\n", + " to_tensor=True,\n", + " ),\n", + " ]\n", + " )\n", + " return post_transforms\n", + "\n", + "def get_model(device, weights_path, trt_model_path, trt_flag=False):\n", + " if not trt_flag:\n", + " model = SegResNet(\n", + " spatial_dims=3,\n", + " in_channels=1,\n", + " out_channels=105,\n", + " init_filters=32,\n", + " blocks_down=[1, 2, 2, 4],\n", + " blocks_up=[1, 1, 1],\n", + " dropout_prob=0.2,\n", + " )\n", + " weights = torch.load(weights_path)\n", + " model.load_state_dict(weights)\n", + " model.to(device)\n", + " model.eval()\n", + " else:\n", + " model = torch.jit.load(trt_model_path)\n", + " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Benchmark the TensorRT Accelerated Bundle Inference\n", - "In this part, the TensorRT accelerated model is loaded to the `workflow`. The updated `workflow` runs the same iterations as before to benchmark the latency difference. Since the TensorRT accelerated model cannot be loaded through the `CheckpointLoader` and don't have `amp` mode, disable the `CheckpointLoader` in the `initialize` of the `workflow` and the `amp` parameter in the `evaluator` of the `workflow` needs to be set to `False`." + "### Define Inference Workflow\n", + "\n" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ - "if benchmark_type == \"trt\":\n", - " trt_model_path = os.path.join(bundle_path, \"models\", \"model_trt.ts\")\n", - " trt_model = torch.jit.load(trt_model_path)\n", + "def run_inference(data_list, infer_transforms, post_transforms, model, device, benchmark_type):\n", + " total_time_dict = {}\n", + " roi_size = (96, 96, 96)\n", + " sw_batch_size = 1\n", + "\n", + " for idx, sample in enumerate(data_list[:5]):\n", + " start = timer()\n", + " data = infer_transforms({\"image\": sample})\n", "\n", - " override[\"load_pretrain\"] = False\n", - " override[\"network_def\"] = trt_model\n", - " override[\"evaluator#amp\"] = False\n", + " with torch.no_grad():\n", + " input_image = data[\"image\"].unsqueeze(0).to(device) if benchmark_type in [\"trt\", \"original\"] else data[\"image\"].unsqueeze(0)\n", + " if benchmark_type == \"original\":\n", + " with torch.autocast(device_type=\"cuda\"):\n", + " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", + " else:\n", + " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", + " \n", + " data[\"pred\"] = output_image.squeeze(0)\n", + " # data = post_transforms(data)\n", + " \n", + " end = timer()\n", "\n", - " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", - " trt_timer = TimerHandler()\n", - " benchmark_df = benchmark_workflow(workflow, trt_timer, benchmark_type)" + " sample_name = sample.split(\"/\")[-1]\n", + " if idx > 0:\n", + " total_time_dict[sample_name] = end - start\n", + "\n", + " return total_time_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Benchmarking TensorRT Accelerated Bundle Inference with GPU Data Loading and GPU-based Transforms\n", - "\n", - "In the previous section, the inference workflow utilized CPU-based transforms. In this section, we enhance performance by leveraging GPU acceleration:\n", - "\n", - "- **GPU Direct Storage (GDS)**: The `LoadImaged` transform enables GDS on `.nii` and `.dcm` files via specifying `to_gpu=True`.\n", - "- **GPU-based Transforms**: After GDS, subsequent preprocessing transforms are executed directly on the GPU." + "## Benchmark the end-to-end bundle inference" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "transforms = Compose([\n", - " LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=False),\n", - " EnsureTyped(keys=\"image\", device=torch.device(\"cuda:0\")),\n", - " EnsureChannelFirstd(keys=\"image\"),\n", - " Orientationd(keys=\"image\", axcodes=\"RAS\"),\n", - " Spacingd(keys=\"image\", pixdim=[1.5, 1.5, 2.0], mode=\"bilinear\"),\n", - " ScaleIntensityRanged(keys=\"image\", a_min=-57, a_max=164, b_min=0, b_max=1, clip=True),\n", - "])\n", - "\n", - "dataset = Dataset(data=[{\"image\": i} for i in train_files], transform=transforms)\n", - "dataloader = ThreadDataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)" + "gpu_transforms_flag = False\n", + "gpu_loading_flag = False\n", + "trt_flag = False\n", + "\n", + "if \"trt\" in benchmark_type:\n", + " trt_flag = True\n", + "if \"gpu_transforms\" in benchmark_type:\n", + " gpu_transforms_flag = True\n", + "if \"gds\" in benchmark_type:\n", + " gpu_loading_flag = True\n", + "\n", + "infer_transforms = get_transforms(device, gpu_loading_flag, gpu_transforms_flag)\n", + "post_transforms = get_post_transforms(infer_transforms)\n", + "model = get_model(device, weights_path, trt_model_path, trt_flag)\n", + "\n", + "total_time_dict = run_inference(train_files, infer_transforms, post_transforms, model, device, benchmark_type)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "if benchmark_type == \"trt_gds\":\n", - "\n", - " trt_model_path = os.path.join(bundle_path, \"models\", \"model_trt.ts\")\n", - " trt_model = torch.jit.load(trt_model_path)\n", - " override = {\n", - " \"output_postfix\": benchmark_type,\n", - " \"load_pretrain\": False,\n", - " \"network_def\": trt_model,\n", - " \"evaluator#amp\": False,\n", - " \"preprocessing\": transforms,\n", - " \"dataset\": dataset,\n", - " \"dataloader\": dataloader,\n", - " }\n", - "\n", - " workflow = prepare_workflow(inference_config, meta_config, bundle_path, override)\n", - " trt_gpu_trans_timer = TimerHandler()\n", - " benchmark_df = benchmark_workflow(workflow, trt_gpu_trans_timer, benchmark_type)" + "df = pd.DataFrame(list(total_time_dict.items()), columns=[\"file_name\", \"time\"])\n", + "df.to_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"), index=False)" ] } ], diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py index 372f05a906..0e8eec95f4 100644 --- a/acceleration/fast_inference_tutorial/utils.py +++ b/acceleration/fast_inference_tutorial/utils.py @@ -10,34 +10,30 @@ # limitations under the License. -import os import glob +import os import shutil + import monai -import pandas as pd -import numpy as np -from collections import OrderedDict import torch -from ignite.engine import Engine -from ignite.engine import Events -from monai.engines import IterationEvents -from monai.bundle import trt_export from monai.apps import download_and_extract +from monai.data.torchscript_utils import save_net_with_metadata +from monai.networks.nets import SegResNet +from monai.networks.utils import convert_to_trt def prepare_test_datalist(root_dir): - resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar" - md5 = "410d4a301da4e5b2f6f86ec3ddba524e" + resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar" - compressed_file = os.path.join(root_dir, "Task09_Spleen.tar") - data_root = os.path.join(root_dir, "Task09_Spleen") + compressed_file = os.path.join(root_dir, "Task03_Liver.tar") + data_root = os.path.join(root_dir, "Task03_Liver") if not os.path.exists(data_root): - download_and_extract(resource, compressed_file, root_dir, md5) + download_and_extract(resource, compressed_file, root_dir) - nii_dir = os.path.join(data_root, "imagesTr_nii") + nii_dir = os.path.join(data_root, "imagesTs_nii") if not os.path.exists(nii_dir): os.makedirs(nii_dir, exist_ok=True) - train_gz_files = sorted(glob.glob(os.path.join(data_root, "imagesTr", "*.nii.gz"))) + train_gz_files = sorted(glob.glob(os.path.join(data_root, "imagesTs", "*.nii.gz"))) for file in train_gz_files: new_file = file.replace(".nii.gz", ".nii") if not os.path.exists(new_file): @@ -46,149 +42,49 @@ def prepare_test_datalist(root_dir): else: print(f"Test data already exists at {nii_dir}") - train_files = sorted(glob.glob(os.path.join(nii_dir, "*.nii"))) - return train_files + files = sorted(glob.glob(os.path.join(nii_dir, "*.nii"))) + return files + +def prepare_model_weights(root_dir, bundle_name="spleen_ct_segmentation"): + bundle_path = os.path.join(root_dir, bundle_name) + weights_path = os.path.join(root_dir, "model.pt") + if not os.path.exists(weights_path): + monai.bundle.download(name=bundle_name, bundle_dir=root_dir) -def prepare_test_bundle(bundle_dir, bundle_name="spleen_ct_segmentation"): - bundle_path = os.path.join(bundle_dir, bundle_name) - if not os.path.exists(bundle_path): - monai.bundle.download(name=bundle_name, bundle_dir=bundle_dir) + weights_original_path = os.path.join(bundle_path, "models", "model.pt") + shutil.copy(weights_original_path, weights_path) else: - print(f"Bundle already exists at {bundle_path}") - return bundle_path - - -def prepare_tensorrt_model(bundle_path, trt_model_name="model_trt.ts"): - output_path = os.path.join(bundle_path, "models", trt_model_name) - if not os.path.exists(output_path): - trt_export( - net_id="network_def", - filepath=output_path, - ckpt_file=os.path.join(bundle_path, "models", "model.pt"), - meta_file=os.path.join(bundle_path, "configs", "metadata.json"), - config_file=os.path.join(bundle_path, "configs", "inference.json"), - precision="fp16", - dynamic_batchsize=[1, 4, 8], - use_onnx=True, - use_trace=True + print(f"Weights already exists at {weights_path}") + + return weights_path + + +def prepare_tensorrt_model(root_dir, weights_path, trt_model_name="model_trt.ts"): + trt_path = os.path.join(root_dir, trt_model_name) + if not os.path.exists(trt_path): + model = SegResNet( + spatial_dims=3, + in_channels=1, + out_channels=105, + init_filters=32, + blocks_down=[1, 2, 2, 4], + blocks_up=[1, 1, 1], + dropout_prob=0.2, ) + weights = torch.load(weights_path) + model.load_state_dict(weights) + torchscript_model = convert_to_trt( + model=model, + precision="fp32", + input_shape=[1, 1, 96, 96, 96], + dynamic_batchsize=[1, 1, 1], + use_trace=True, + verify=False, + ) + + save_net_with_metadata(torchscript_model, trt_model_name.split(".")[0]) else: - print(f"TensorRT model already exists at {output_path}") - - -class CUDATimer: - def __init__(self, type_str) -> None: - self.time_list = [] - self.type_str = type_str - - def start(self) -> None: - self.starter = torch.cuda.Event(enable_timing=True) - self.ender = torch.cuda.Event(enable_timing=True) - torch.cuda.synchronize() - self.starter.record() - - def end(self) -> None: - self.ender.record() - torch.cuda.synchronize() - self.time_list.append(self.starter.elapsed_time(self.ender)) - - def get_max(self) -> float: - return max(self.time_list) - - def get_min(self) -> float: - return min(self.time_list) - - def get_mean(self) -> float: - np_time = np.array(self.time_list) - return np.mean(np_time) - - def get_std(self) -> float: - np_time = np.array(self.time_list) - return np.std(np_time) - - def get_sum(self) -> float: - np_time = np.array(self.time_list) - return np.sum(np_time) - - def get_results_dict(self) -> OrderedDict: - out_list = [ - ("total", self.get_sum()), - ("min", self.get_min()), - ("max", self.get_max()), - ("mean", self.get_mean()), - ("std", self.get_std()), - ] - return OrderedDict(out_list) - - -class TimerHandler: - def __init__(self) -> None: - self.run_timer = CUDATimer("RUN") - self.epoch_timer = CUDATimer("EPOCH") - self.iteration_timer = CUDATimer("ITERATION") - self.net_forward_timer = CUDATimer("FORWARD") - self.get_batch_timer = CUDATimer("PREPARE_BATCH") - self.post_process_timer = CUDATimer("POST_PROCESS") - self.timer_list = [ - self.run_timer, - self.epoch_timer, - self.iteration_timer, - self.net_forward_timer, - self.get_batch_timer, - self.post_process_timer, - ] - - def attach(self, engine: Engine) -> None: - engine.add_event_handler(Events.STARTED, self.started, timer=self.run_timer) - engine.add_event_handler(Events.EPOCH_STARTED, self.started, timer=self.epoch_timer) - engine.add_event_handler(Events.ITERATION_STARTED, self.started, timer=self.iteration_timer) - engine.add_event_handler(Events.GET_BATCH_STARTED, self.started, timer=self.get_batch_timer) - engine.add_event_handler(Events.GET_BATCH_COMPLETED, self.completed, timer=self.get_batch_timer) - engine.add_event_handler(Events.GET_BATCH_COMPLETED, self.started, timer=self.net_forward_timer) - engine.add_event_handler(IterationEvents.FORWARD_COMPLETED, self.completed, timer=self.net_forward_timer) - engine.add_event_handler(IterationEvents.FORWARD_COMPLETED, self.started, timer=self.post_process_timer) - engine.add_event_handler(Events.ITERATION_COMPLETED, self.completed, timer=self.post_process_timer) - engine.add_event_handler(Events.ITERATION_COMPLETED, self.completed, timer=self.iteration_timer) - engine.add_event_handler(Events.EPOCH_COMPLETED, self.completed, timer=self.epoch_timer) - engine.add_event_handler(Events.COMPLETED, self.completed, timer=self.run_timer) - - def started(self, engine: Engine, timer: CUDATimer) -> None: - timer.start() - - def completed(self, engine: Engine, timer: CUDATimer) -> None: - timer.end() - - def print_results(self) -> None: - index = [x.type_str for x in self.timer_list] - column_title = list(self.timer_list[0].get_results_dict().keys()) - column_title = [x + "/ms" for x in column_title] - latency_list = [x for timer in self.timer_list for x in timer.get_results_dict().values()] - latency_array = np.array(latency_list) - latency_array = np.reshape(latency_array, (len(index), len(column_title))) - df = pd.DataFrame(latency_array, index=index, columns=column_title) - return df - - -def prepare_workflow(inference_config, meta_config, bundle_path, override): - workflow = monai.bundle.ConfigWorkflow( - workflow="infer", - config_file=inference_config, - meta_file=meta_config, - logging_file=os.path.join(bundle_path, "configs", "logging.conf"), - bundle_root=bundle_path, - **override, - ) - - return workflow - -def benchmark_workflow(workflow, timer, benchmark_type): - workflow.initialize() - timer.attach(workflow.evaluator) - workflow.run() - workflow.finalize() - - benchmark_df = timer.print_results() - benchmark_df.to_csv(f"benchmark_{benchmark_type}.csv") - - return benchmark_df + print(f"TensorRT model already exists at {trt_path}") + + return os.path.join(root_dir, trt_model_name) From 5bd3f675cd64884935cd8193fd6e0466f192ac73 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 7 Mar 2025 04:55:03 +0000 Subject: [PATCH 04/11] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../fast_inference_tutorial.ipynb | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index e5affc4c58..5e75e707ab 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -109,7 +109,7 @@ " Invertd,\n", " Activationsd,\n", " AsDiscreted,\n", - " Compose\n", + " Compose,\n", ")\n", "from monai.inferers import sliding_window_inference\n", "from monai.networks.nets import SegResNet\n", @@ -316,6 +316,7 @@ "\n", " return infer_transforms\n", "\n", + "\n", "def get_post_transforms(infer_transforms):\n", " post_transforms = Compose(\n", " [\n", @@ -332,6 +333,7 @@ " )\n", " return post_transforms\n", "\n", + "\n", "def get_model(device, weights_path, trt_model_path, trt_flag=False):\n", " if not trt_flag:\n", " model = SegResNet(\n", @@ -376,16 +378,20 @@ " data = infer_transforms({\"image\": sample})\n", "\n", " with torch.no_grad():\n", - " input_image = data[\"image\"].unsqueeze(0).to(device) if benchmark_type in [\"trt\", \"original\"] else data[\"image\"].unsqueeze(0)\n", + " input_image = (\n", + " data[\"image\"].unsqueeze(0).to(device)\n", + " if benchmark_type in [\"trt\", \"original\"]\n", + " else data[\"image\"].unsqueeze(0)\n", + " )\n", " if benchmark_type == \"original\":\n", " with torch.autocast(device_type=\"cuda\"):\n", " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", " else:\n", " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", - " \n", + "\n", " data[\"pred\"] = output_image.squeeze(0)\n", " # data = post_transforms(data)\n", - " \n", + "\n", " end = timer()\n", "\n", " sample_name = sample.split(\"/\")[-1]\n", From 45b5da33ec00cbb81024ea9e382837bdd30080c6 Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Fri, 7 Mar 2025 09:43:58 +0000 Subject: [PATCH 05/11] add scripts and update notebook Signed-off-by: Yiheng Wang --- .../fast_inference_tutorial.ipynb | 161 ++++++++++++------ .../fast_inference_tutorial/run_benchmark.py | 150 ++++++++++++++++ acceleration/fast_inference_tutorial/utils.py | 2 +- 3 files changed, 263 insertions(+), 50 deletions(-) create mode 100644 acceleration/fast_inference_tutorial/run_benchmark.py diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 5e75e707ab..23d18260b1 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -72,10 +72,8 @@ "!python -c \"import matplotlib\" || pip install -q matplotlib\n", "!python -c \"import torch_tensorrt\" || pip install torch_tensorrt\n", "!python -c \"import kvikio\" || pip install kvikio-cu12\n", - "!python -c \"import ignite\" || pip install pytorch-ignite\n", "!python -c \"import pandas\" || pip install pandas\n", "!python -c \"import requests\" || pip install requests\n", - "!python -c \"import fire\" || pip install fire\n", "!python -c \"import onnx\" || pip install onnx\n", "%matplotlib inline" ] @@ -106,19 +104,16 @@ " Spacingd,\n", " NormalizeIntensityd,\n", " ScaleIntensityd,\n", - " Invertd,\n", - " Activationsd,\n", - " AsDiscreted,\n", " Compose,\n", ")\n", "from monai.inferers import sliding_window_inference\n", "from monai.networks.nets import SegResNet\n", + "import matplotlib.pyplot as plt\n", "import torch\n", + "import gc\n", "import pandas as pd\n", "from timeit import default_timer as timer\n", "\n", - "print(f\"Torch-TensorRT version: {torch_tensorrt.__version__}.\")\n", - "\n", "print_config()" ] }, @@ -163,8 +158,8 @@ " precision=\"fp16\",\n", " input_shape=[1, 1, 96, 96, 96],\n", " dynamic_batchsize=[1, 1, 1],\n", - " use_trace=False,\n", - " verify=True,\n", + " use_trace=True,\n", + " verify=False,\n", ")\n", "\n", "save_net_with_metadata(torchscript_model, \"segresnet_trt\")\n", @@ -236,15 +231,15 @@ "\n", "A variable `benchmark_type` is used to specify the type of benchmark to run. To have a fair comparison, each benchmark type should be run after restarting the notebook kernel. `benchmark_type` can be one of the following:\n", "\n", - "- `\"original\"`: benchmark the original model inference (with `amp` enabled).\n", + "- `\"original\"`: benchmark the original model inference.\n", "- `\"trt\"`: benchmark the TensorRT accelerated model inference.\n", - "- `\"trt_gpu_transforms\"`: benchmark the TensorRT accelerated model inference with GPU transforms.\n", - "- `\"trt_gds_gpu_transforms\"`: benchmark the TensorRT accelerated model inference with GPU data loading and GPU transforms." + "- `\"trt_gpu_transforms\"`: benchmark the model inference with GPU transforms.\n", + "- `\"trt_gds_gpu_transforms\"`: benchmark the model inference with GPU data loading and GPU transforms." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -276,8 +271,12 @@ "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", "\n", "root_dir = \".\"\n", + "torch.backends.cudnn.benchmark = True\n", + "torch_tensorrt.runtime.set_multi_device_safe_mode(True)\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "train_files = prepare_test_datalist(root_dir)\n", + "# since the dataset is too large, the smallest 21 files are used for warm up (1 file) and benchmarking (11 files)\n", + "train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:21]\n", "weights_path = prepare_model_weights(root_dir=root_dir, bundle_name=\"wholeBody_ct_segmentation\")\n", "trt_model_name = \"model_trt.ts\"\n", "trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name)" @@ -292,7 +291,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -317,23 +316,6 @@ " return infer_transforms\n", "\n", "\n", - "def get_post_transforms(infer_transforms):\n", - " post_transforms = Compose(\n", - " [\n", - " Activationsd(keys=\"pred\", softmax=True),\n", - " AsDiscreted(keys=\"pred\", argmax=True),\n", - " Invertd(\n", - " keys=\"pred\",\n", - " transform=infer_transforms,\n", - " orig_keys=\"image\",\n", - " nearest_interp=True,\n", - " to_tensor=True,\n", - " ),\n", - " ]\n", - " )\n", - " return post_transforms\n", - "\n", - "\n", "def get_model(device, weights_path, trt_model_path, trt_flag=False):\n", " if not trt_flag:\n", " model = SegResNet(\n", @@ -364,16 +346,16 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [], "source": [ - "def run_inference(data_list, infer_transforms, post_transforms, model, device, benchmark_type):\n", + "def run_inference(data_list, infer_transforms, model, device, benchmark_type):\n", " total_time_dict = {}\n", " roi_size = (96, 96, 96)\n", " sw_batch_size = 1\n", - "\n", - " for idx, sample in enumerate(data_list[:5]):\n", + " \n", + " for idx, sample in enumerate(data_list[:10]):\n", " start = timer()\n", " data = infer_transforms({\"image\": sample})\n", "\n", @@ -383,21 +365,24 @@ " if benchmark_type in [\"trt\", \"original\"]\n", " else data[\"image\"].unsqueeze(0)\n", " )\n", - " if benchmark_type == \"original\":\n", - " with torch.autocast(device_type=\"cuda\"):\n", - " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", - " else:\n", - " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", "\n", - " data[\"pred\"] = output_image.squeeze(0)\n", - " # data = post_transforms(data)\n", + " output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model)\n", + " output_image = output_image.cpu()\n", "\n", " end = timer()\n", "\n", + " print(output_image.mean())\n", + "\n", + " del data\n", + " del input_image\n", + " del output_image\n", + " torch.cuda.empty_cache()\n", + " gc.collect()\n", + "\n", " sample_name = sample.split(\"/\")[-1]\n", " if idx > 0:\n", " total_time_dict[sample_name] = end - start\n", - "\n", + " print(end - start)\n", " return total_time_dict" ] }, @@ -405,7 +390,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Benchmark the end-to-end bundle inference" + "### Running the Benchmark\n", + "\n", + "The cell below will execute the benchmark based on the `benchmark_type` variable.\n", + "\n", + "#### Optional: Using the Python Script\n", + "\n", + "For convenience, a Python script, [`run_benchmark.py`](./run_benchmark.py), is available to run the benchmark. You can open a terminal and execute the following command to run the benchmark for all benchmark types:\n", + "\n", + "\n", + "```bash\n", + "for benchmark_type in \"original\" \"trt\" \"trt_gpu_transforms\" \"trt_gds_gpu_transforms\"; do\n", + " python run_benchmark.py --benchmark_type \"$benchmark_type\"\n", + "done\n", + "```" ] }, { @@ -426,21 +424,86 @@ " gpu_loading_flag = True\n", "\n", "infer_transforms = get_transforms(device, gpu_loading_flag, gpu_transforms_flag)\n", - "post_transforms = get_post_transforms(infer_transforms)\n", "model = get_model(device, weights_path, trt_model_path, trt_flag)\n", "\n", - "total_time_dict = run_inference(train_files, infer_transforms, post_transforms, model, device, benchmark_type)" + "total_time_dict = run_inference(train_files, infer_transforms, model, device, benchmark_type)\n", + "\n", + "df = pd.DataFrame(list(total_time_dict.items()), columns=[\"file_name\", \"time\"])\n", + "df.to_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"), index=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analyze and Visualize the Results\n", + "\n", + "In this section, we will analyze and visualize the results.\n", + "All cell outputs presented in this section were obtained by a NVIDIA RTX A6000 GPU." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Collect Benchmark Results" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ - "df = pd.DataFrame(list(total_time_dict.items()), columns=[\"file_name\", \"time\"])\n", - "df.to_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"), index=False)" + "# collect benchmark results\n", + "all_df = pd.read_csv(os.path.join(root_dir, f\"time_original.csv\"))\n", + "all_df.columns = [\"file_name\", \"original_time\"]\n", + "\n", + "for benchmark_type in [\"trt\", \"trt_gpu_transforms\", \"trt_gds_gpu_transforms\"]:\n", + " df = pd.read_csv(os.path.join(root_dir, f\"time_{benchmark_type}.csv\"))\n", + " df.columns = [\"file_name\", f\"{benchmark_type}_time\"]\n", + " all_df = pd.merge(all_df, df, on=\"file_name\", how=\"left\")\n", + "\n", + "# for each file, add it's size\n", + "all_df[\"file_size\"] = all_df[\"file_name\"].apply(lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x)))\n", + "# sort by file size\n", + "all_df = all_df.sort_values(by=\"file_size\", ascending=True)\n", + "# convert file size to MB\n", + "all_df[\"file_size\"] = all_df[\"file_size\"] / 1024 / 1024\n", + "# get the average time for each benchmark type\n", + "average_time = all_df.mean(numeric_only=True)\n", + "del average_time[\"file_size\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize Average Inference Time for Each Benchmark Type" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(10, 6))\n", + "average_time.plot(kind='bar', color=['skyblue', 'orange', 'green', 'red'])\n", + "plt.title('Average Inference Time for Each Benchmark Type')\n", + "plt.xlabel('Benchmark Type')\n", + "plt.ylabel('Average Time (seconds)')\n", + "plt.xticks(rotation=45)\n", + "plt.tight_layout()\n", + "plt.show()" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/acceleration/fast_inference_tutorial/run_benchmark.py b/acceleration/fast_inference_tutorial/run_benchmark.py new file mode 100644 index 0000000000..0ec96df3d1 --- /dev/null +++ b/acceleration/fast_inference_tutorial/run_benchmark.py @@ -0,0 +1,150 @@ +# Copyright (c) MONAI Consortium +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import argparse +import gc +import os +from timeit import default_timer as timer + +import pandas as pd +import torch +import torch_tensorrt +from monai.inferers import sliding_window_inference +from monai.networks.nets import SegResNet +from monai.transforms import (Activationsd, AsDiscreted, Compose, + EnsureChannelFirstd, EnsureTyped, Invertd, + LoadImaged, NormalizeIntensityd, Orientationd, + ScaleIntensityd, Spacingd) + +from utils import (prepare_model_weights, prepare_tensorrt_model, + prepare_test_datalist) + + +def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False): + preprocess_transforms = [ + LoadImaged(keys="image", reader="NibabelReader", to_gpu=gpu_loading_flag), + EnsureChannelFirstd(keys="image"), + Orientationd(keys=["image"], axcodes="RAS"), + Spacingd(keys=["image"], pixdim=(1.5, 1.5, 1.5), mode="bilinear"), + NormalizeIntensityd(keys="image", nonzero=True), + ScaleIntensityd( + keys=["image"], + minv=-1.0, + maxv=1.0, + ), + ] + + if gpu_transforms_flag and not gpu_loading_flag: + preprocess_transforms.insert(1, EnsureTyped(keys="image", device=device, track_meta=True)) + infer_transforms = Compose(preprocess_transforms) + + return infer_transforms + +def get_post_transforms(infer_transforms): + post_transforms = Compose( + [ + Activationsd(keys="pred", softmax=True), + AsDiscreted(keys="pred", argmax=True), + Invertd( + keys="pred", + transform=infer_transforms, + orig_keys="image", + nearest_interp=True, + to_tensor=True, + ), + ] + ) + return post_transforms + +def get_model(device, weights_path, trt_model_path, trt_flag=False): + if not trt_flag: + model = SegResNet( + spatial_dims=3, + in_channels=1, + out_channels=105, + init_filters=32, + blocks_down=[1, 2, 2, 4], + blocks_up=[1, 1, 1], + dropout_prob=0.2, + ) + weights = torch.load(weights_path) + model.load_state_dict(weights) + model.to(device) + model.eval() + else: + model = torch.jit.load(trt_model_path) + return model + +def run_inference(data_list, infer_transforms, model, device, benchmark_type): + total_time_dict = {} + roi_size = (96, 96, 96) + sw_batch_size = 1 + + for idx, sample in enumerate(data_list): + start = timer() + data = infer_transforms({"image": sample}) + + with torch.no_grad(): + input_image = ( + data["image"].unsqueeze(0).to(device) + if benchmark_type in ["trt", "original"] + else data["image"].unsqueeze(0) + ) + + output_image = sliding_window_inference(input_image, roi_size, sw_batch_size, model) + output_image = output_image.cpu() + + end = timer() + + del data + del input_image + del output_image + torch.cuda.empty_cache() + gc.collect() + + sample_name = sample.split("/")[-1] + if idx > 0: + total_time_dict[sample_name] = end - start + + return total_time_dict + +def main(): + parser = argparse.ArgumentParser(description="Run inference benchmark.") + parser.add_argument("--benchmark_type", type=str, default="original", help="Type of benchmark to run") + args = parser.parse_args() + + ### Prepare the environment + root_dir = "." + torch.backends.cudnn.benchmark = True + torch_tensorrt.runtime.set_multi_device_safe_mode(True) + device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") + train_files = prepare_test_datalist(root_dir) + # since the dataset is too large, the smallest 21 files are used for warm up (1 file) and benchmarking (11 files) + train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:21] + weights_path = prepare_model_weights(root_dir=root_dir, bundle_name="wholeBody_ct_segmentation") + trt_model_name = "model_trt.ts" + trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name) + + gpu_transforms_flag = "gpu_transforms" in args.benchmark_type + gpu_loading_flag = "gds" in args.benchmark_type + trt_flag = "trt" in args.benchmark_type + # Get components + infer_transforms = get_transforms(device, gpu_loading_flag, gpu_transforms_flag) + model = get_model(device, weights_path, trt_model_path, trt_flag) + # Run inference + total_time_dict = run_inference(train_files, infer_transforms, model, device, args.benchmark_type) + # Save the results + df = pd.DataFrame(list(total_time_dict.items()), columns=["file_name", "time"]) + df.to_csv(os.path.join(root_dir, f"time_{args.benchmark_type}.csv"), index=False) + +if __name__ == "__main__": + main() diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py index 0e8eec95f4..ac14f55845 100644 --- a/acceleration/fast_inference_tutorial/utils.py +++ b/acceleration/fast_inference_tutorial/utils.py @@ -76,7 +76,7 @@ def prepare_tensorrt_model(root_dir, weights_path, trt_model_name="model_trt.ts" model.load_state_dict(weights) torchscript_model = convert_to_trt( model=model, - precision="fp32", + precision="fp16", input_shape=[1, 1, 96, 96, 96], dynamic_batchsize=[1, 1, 1], use_trace=True, From dc1c24f444c6d4ea2873b3ebe075fc35cdf4789e Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 7 Mar 2025 09:45:53 +0000 Subject: [PATCH 06/11] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../fast_inference_tutorial.ipynb | 14 +++++---- .../fast_inference_tutorial/run_benchmark.py | 29 ++++++++++++++----- 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 23d18260b1..867bb691be 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -354,7 +354,7 @@ " total_time_dict = {}\n", " roi_size = (96, 96, 96)\n", " sw_batch_size = 1\n", - " \n", + "\n", " for idx, sample in enumerate(data_list[:10]):\n", " start = timer()\n", " data = infer_transforms({\"image\": sample})\n", @@ -465,7 +465,9 @@ " all_df = pd.merge(all_df, df, on=\"file_name\", how=\"left\")\n", "\n", "# for each file, add it's size\n", - "all_df[\"file_size\"] = all_df[\"file_name\"].apply(lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x)))\n", + "all_df[\"file_size\"] = all_df[\"file_name\"].apply(\n", + " lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x))\n", + ")\n", "# sort by file size\n", "all_df = all_df.sort_values(by=\"file_size\", ascending=True)\n", "# convert file size to MB\n", @@ -489,10 +491,10 @@ "outputs": [], "source": [ "plt.figure(figsize=(10, 6))\n", - "average_time.plot(kind='bar', color=['skyblue', 'orange', 'green', 'red'])\n", - "plt.title('Average Inference Time for Each Benchmark Type')\n", - "plt.xlabel('Benchmark Type')\n", - "plt.ylabel('Average Time (seconds)')\n", + "average_time.plot(kind=\"bar\", color=[\"skyblue\", \"orange\", \"green\", \"red\"])\n", + "plt.title(\"Average Inference Time for Each Benchmark Type\")\n", + "plt.xlabel(\"Benchmark Type\")\n", + "plt.ylabel(\"Average Time (seconds)\")\n", "plt.xticks(rotation=45)\n", "plt.tight_layout()\n", "plt.show()" diff --git a/acceleration/fast_inference_tutorial/run_benchmark.py b/acceleration/fast_inference_tutorial/run_benchmark.py index 0ec96df3d1..a825988310 100644 --- a/acceleration/fast_inference_tutorial/run_benchmark.py +++ b/acceleration/fast_inference_tutorial/run_benchmark.py @@ -20,13 +20,21 @@ import torch_tensorrt from monai.inferers import sliding_window_inference from monai.networks.nets import SegResNet -from monai.transforms import (Activationsd, AsDiscreted, Compose, - EnsureChannelFirstd, EnsureTyped, Invertd, - LoadImaged, NormalizeIntensityd, Orientationd, - ScaleIntensityd, Spacingd) - -from utils import (prepare_model_weights, prepare_tensorrt_model, - prepare_test_datalist) +from monai.transforms import ( + Activationsd, + AsDiscreted, + Compose, + EnsureChannelFirstd, + EnsureTyped, + Invertd, + LoadImaged, + NormalizeIntensityd, + Orientationd, + ScaleIntensityd, + Spacingd, +) + +from utils import prepare_model_weights, prepare_tensorrt_model, prepare_test_datalist def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False): @@ -49,6 +57,7 @@ def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False): return infer_transforms + def get_post_transforms(infer_transforms): post_transforms = Compose( [ @@ -65,6 +74,7 @@ def get_post_transforms(infer_transforms): ) return post_transforms + def get_model(device, weights_path, trt_model_path, trt_flag=False): if not trt_flag: model = SegResNet( @@ -84,11 +94,12 @@ def get_model(device, weights_path, trt_model_path, trt_flag=False): model = torch.jit.load(trt_model_path) return model + def run_inference(data_list, infer_transforms, model, device, benchmark_type): total_time_dict = {} roi_size = (96, 96, 96) sw_batch_size = 1 - + for idx, sample in enumerate(data_list): start = timer() data = infer_transforms({"image": sample}) @@ -117,6 +128,7 @@ def run_inference(data_list, infer_transforms, model, device, benchmark_type): return total_time_dict + def main(): parser = argparse.ArgumentParser(description="Run inference benchmark.") parser.add_argument("--benchmark_type", type=str, default="original", help="Type of benchmark to run") @@ -146,5 +158,6 @@ def main(): df = pd.DataFrame(list(total_time_dict.items()), columns=["file_name", "time"]) df.to_csv(os.path.join(root_dir, f"time_{args.benchmark_type}.csv"), index=False) + if __name__ == "__main__": main() From f4840a76f2c071f1d12b96342e33d9e2966f561e Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Sat, 8 Mar 2025 03:53:36 +0000 Subject: [PATCH 07/11] finalize report Signed-off-by: Yiheng Wang --- .../fast_inference_tutorial.ipynb | 182 ++++++++++++++---- .../fast_inference_tutorial/run_benchmark.py | 15 +- acceleration/fast_inference_tutorial/utils.py | 2 +- runner.sh | 2 + 4 files changed, 156 insertions(+), 45 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 23d18260b1..59864f534b 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -24,13 +24,17 @@ "source": [ "## Accelerating Model Inference with MONAI\n", "\n", - "In this tutorial, we explore three powerful features that can accelerate model inference using MONAI. These features are designed to optimize the data handling and computational efficiency of your inference pipeline, particularly when working with NVIDIA GPUs. The tutorial will guide you through the following features and provide a comprehensive benchmarking strategy to evaluate the performance improvements offered by each feature:\n", + "In the rapidly evolving field of medical imaging, the ability to perform fast and efficient model inference is crucial for real-time diagnostics and treatment planning. This tutorial explores three advanced features of the MONAI framework that are designed to significantly accelerate model inference, particularly when leveraging the computational power of NVIDIA GPUs.\n", "\n", - "1. **TensorRT Inference**: Utilize NVIDIA's TensorRT to optimize and execute models for high-performance inference on NVIDIA GPUs.\n", + "1. **TensorRT Inference**: Learn how to utilize NVIDIA's TensorRT to optimize and execute models for high-performance inference, reducing latency and improving throughput.\n", "\n", - "2. **GPU-Based Preprocessing**: Leverage the computational power of GPUs to perform data preprocessing directly on the GPU. This can significantly reduce the time spent on data preparation, enabling faster inference.\n", + "2. **GPU-Based Preprocessing**: Discover how to offload data preprocessing tasks to the GPU, minimizing CPU bottlenecks and accelerating the overall inference pipeline.\n", "\n", - "3. **Direct GPU Data Loading**: Minimize data transfer times by loading data directly from disk into GPU memory. This feature supports NIfTI and DICOM file formats." + "3. **Direct GPU Data Loading**: Understand the benefits of loading data directly from disk into GPU memory, which reduces data transfer times and enhances processing efficiency.\n", + "\n", + "In addition to exploring these features, this tutorial provides a comprehensive benchmarking strategy to evaluate the performance improvements offered by each feature. We will use MONAI's [wholeBody_ct_segmentation](https://github.com/Project-MONAI/model-zoo/tree/dev/models/wholeBody_ct_segmentation) as a reference and build a Liver CT segmentation model for benchmarking purposes.\n", + "\n", + "Finally, we will analyze and visualize the benchmark results, offering insights into the performance gains achieved through these optimizations. By the end of this tutorial, you will have a deeper understanding of how to leverage MONAI's capabilities to enhance the efficiency of your medical imaging workflows." ] }, { @@ -121,7 +125,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Introduction on Fast Inference Features\n" + "## Introduction on Fast Inference Features" ] }, { @@ -136,7 +140,7 @@ "\n", "example:\n", "\n", - "```python\n", + "```py\n", "\n", "from monai.networks.nets import SegResNet\n", "from monai.networks.utils import convert_to_trt\n", @@ -157,7 +161,7 @@ " model=model,\n", " precision=\"fp16\",\n", " input_shape=[1, 1, 96, 96, 96],\n", - " dynamic_batchsize=[1, 1, 1],\n", + " dynamic_batchsize=[1, 4, 4],\n", " use_trace=True,\n", " verify=False,\n", ")\n", @@ -239,7 +243,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -257,16 +261,28 @@ "source": [ "### Prepare Data and Model\n", "\n", - "The [Medical Segmentation Decathlon Task03 Liver dataset](http://medicaldecathlon.com/) is used as an example to benchmark the acceleration performance.A helper script, [`prepare_data.py`](./prepare_data.py), is used to download and extract the dataset. In addition, the script also prepares the model weights and TensorRT engine-based TorchScript model.\n", + "The [Medical Segmentation Decathlon Task03 Liver dataset](http://medicaldecathlon.com/) is used as an example to benchmark the acceleration performance.\n", + "\n", + "A helper script, [`prepare_data.py`](./prepare_data.py), is used to download and extract the dataset. In addition, the script also prepares the model weights and TensorRT engine-based TorchScript model.\n", "\n", "The script automatically checks for existing data. This ensures that repeated executions of the notebook do not result in redundant operations." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test data already exists at ./Task03_Liver/imagesTs_nii\n", + "Weights already exists at ./model.pt\n", + "TensorRT model already exists at ./model_trt.ts\n" + ] + } + ], "source": [ "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", "\n", @@ -275,8 +291,8 @@ "torch_tensorrt.runtime.set_multi_device_safe_mode(True)\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "train_files = prepare_test_datalist(root_dir)\n", - "# since the dataset is too large, the smallest 21 files are used for warm up (1 file) and benchmarking (11 files)\n", - "train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:21]\n", + "# since the dataset is too large, the smallest 31 files are used for warm up (1 file) and benchmarking (30 files)\n", + "train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:31]\n", "weights_path = prepare_model_weights(root_dir=root_dir, bundle_name=\"wholeBody_ct_segmentation\")\n", "trt_model_name = \"model_trt.ts\"\n", "trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name)" @@ -291,7 +307,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -346,14 +362,14 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def run_inference(data_list, infer_transforms, model, device, benchmark_type):\n", " total_time_dict = {}\n", " roi_size = (96, 96, 96)\n", - " sw_batch_size = 1\n", + " sw_batch_size = 4\n", " \n", " for idx, sample in enumerate(data_list[:10]):\n", " start = timer()\n", @@ -394,7 +410,7 @@ "\n", "The cell below will execute the benchmark based on the `benchmark_type` variable.\n", "\n", - "#### Optional: Using the Python Script\n", + "#### (Optional) Using the Python Script\n", "\n", "For convenience, a Python script, [`run_benchmark.py`](./run_benchmark.py), is available to run the benchmark. You can open a terminal and execute the following command to run the benchmark for all benchmark types:\n", "\n", @@ -442,16 +458,9 @@ "All cell outputs presented in this section were obtained by a NVIDIA RTX A6000 GPU." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Collect Benchmark Results" - ] - }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -470,47 +479,142 @@ "all_df = all_df.sort_values(by=\"file_size\", ascending=True)\n", "# convert file size to MB\n", "all_df[\"file_size\"] = all_df[\"file_size\"] / 1024 / 1024\n", - "# get the average time for each benchmark type\n", - "average_time = all_df.mean(numeric_only=True)\n", - "del average_time[\"file_size\"]" + "# get the total time for each benchmark type\n", + "total_time = all_df.sum(numeric_only=True)\n", + "del total_time[\"file_size\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Visualize Average Inference Time for Each Benchmark Type" + "### Analyze the Total Inference Time\n", + "\n", + "- TensorRT Improvement:\n", + "Switching from the original model to TensorRT (`trt_time`) results in a slight performance improvement, reducing inference time for 0.93%.\n", + "\n", + "- TensorRT + GPU Transforms Improvement:\n", + "Incorporating GPU transforms (`trt_gpu_transforms_time`) further reduces the inference time by 9.32%.\n", + "\n", + "- TensorRT + GDS + GPU Transforms Improvement:\n", + "The combination of GPU Direct Storage and GPU transforms (`trt_gds_gpu_transforms_time`) provides the most substantial improvement, reducing more than 55% of the inference time compared to the original model." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "original_time 360.122527\n", + "trt_time 356.739906\n", + "trt_gpu_transforms_time 326.563954\n", + "trt_gds_gpu_transforms_time 160.416928\n", + "dtype: float64\n" + ] + } + ], + "source": [ + "print(total_time)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TensorRT Improvement: 0.009392972563697605\n", + "TensorRT + GPU Transforms Improvement: 0.09318654129529037\n", + "TensorRT + GDS + GPU Transforms Improvement: 0.5545490328713701\n" + ] + } + ], + "source": [ + "print(\"TensorRT Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_time\"]) / total_time[\"original_time\"])\n", + "print(\"TensorRT + GPU Transforms Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_gpu_transforms_time\"]) / total_time[\"original_time\"])\n", + "print(\"TensorRT + GDS + GPU Transforms Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_gds_gpu_transforms_time\"]) / total_time[\"original_time\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ + "total_time.index = [\"pytorch_model\", \"TensorRT\", \"TensorRT_GPU_Transform\", \"TensorRT_GPU_Transform_GDS\"]\n", + "\n", "plt.figure(figsize=(10, 6))\n", - "average_time.plot(kind='bar', color=['skyblue', 'orange', 'green', 'red'])\n", - "plt.title('Average Inference Time for Each Benchmark Type')\n", + "total_time.plot(kind='bar', color=['skyblue', 'orange', 'green', 'red'])\n", + "plt.title('Total Inference Time for Each Benchmark Type')\n", "plt.xlabel('Benchmark Type')\n", - "plt.ylabel('Average Time (seconds)')\n", + "plt.ylabel('Total Time (seconds)')\n", "plt.xticks(rotation=45)\n", "plt.tight_layout()\n", "plt.show()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compare the Original model and the most optimized model\n", + "\n", + "If we plot all the scatter points comparing the original model to the most optimized model, it becomes evident that larger files benefit significantly more from our optimizations.\n", + "\n", + "With the file size increasing, the inference time of the original model increases significantly, while the inference time of the most optimized model does not show obvious increase. This indicates that our approach is particularly effective for handling larger datasets." + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(10, 6))\n", + "plt.scatter(all_df[\"file_size\"], all_df[\"original_time\"], label=\"Original Model\")\n", + "plt.scatter(all_df[\"file_size\"], all_df[\"trt_gds_gpu_transforms_time\"], label=\"Optimized Model\")\n", + "plt.xlabel(\"File Size (MB)\")\n", + "plt.ylabel(\"Average Inference Time (seconds)\")\n", + "plt.title(\"Comparison of original and most optimized model\")\n", + "plt.legend()\n", + "plt.show()" + ] } ], "metadata": { "kernelspec": { "display_name": "kvikio_env", "language": "python", - "name": "python3" + "name": "kvikio_env" }, "language_info": { "codemirror_mode": { @@ -522,7 +626,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.14" + "version": "3.10.16" } }, "nbformat": 4, diff --git a/acceleration/fast_inference_tutorial/run_benchmark.py b/acceleration/fast_inference_tutorial/run_benchmark.py index 0ec96df3d1..37901dc98d 100644 --- a/acceleration/fast_inference_tutorial/run_benchmark.py +++ b/acceleration/fast_inference_tutorial/run_benchmark.py @@ -49,6 +49,7 @@ def get_transforms(device, gpu_loading_flag=False, gpu_transforms_flag=False): return infer_transforms + def get_post_transforms(infer_transforms): post_transforms = Compose( [ @@ -65,6 +66,7 @@ def get_post_transforms(infer_transforms): ) return post_transforms + def get_model(device, weights_path, trt_model_path, trt_flag=False): if not trt_flag: model = SegResNet( @@ -84,11 +86,12 @@ def get_model(device, weights_path, trt_model_path, trt_flag=False): model = torch.jit.load(trt_model_path) return model + def run_inference(data_list, infer_transforms, model, device, benchmark_type): total_time_dict = {} roi_size = (96, 96, 96) - sw_batch_size = 1 - + sw_batch_size = 4 + for idx, sample in enumerate(data_list): start = timer() data = infer_transforms({"image": sample}) @@ -114,9 +117,10 @@ def run_inference(data_list, infer_transforms, model, device, benchmark_type): sample_name = sample.split("/")[-1] if idx > 0: total_time_dict[sample_name] = end - start - + print(f"Time taken for {sample_name}: {end - start} seconds") return total_time_dict + def main(): parser = argparse.ArgumentParser(description="Run inference benchmark.") parser.add_argument("--benchmark_type", type=str, default="original", help="Type of benchmark to run") @@ -128,8 +132,8 @@ def main(): torch_tensorrt.runtime.set_multi_device_safe_mode(True) device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") train_files = prepare_test_datalist(root_dir) - # since the dataset is too large, the smallest 21 files are used for warm up (1 file) and benchmarking (11 files) - train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:21] + # since the dataset is too large, the smallest 31 files are used for warm up (1 file) and benchmarking (30 files) + train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:31] weights_path = prepare_model_weights(root_dir=root_dir, bundle_name="wholeBody_ct_segmentation") trt_model_name = "model_trt.ts" trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name) @@ -146,5 +150,6 @@ def main(): df = pd.DataFrame(list(total_time_dict.items()), columns=["file_name", "time"]) df.to_csv(os.path.join(root_dir, f"time_{args.benchmark_type}.csv"), index=False) + if __name__ == "__main__": main() diff --git a/acceleration/fast_inference_tutorial/utils.py b/acceleration/fast_inference_tutorial/utils.py index ac14f55845..60486b7bf0 100644 --- a/acceleration/fast_inference_tutorial/utils.py +++ b/acceleration/fast_inference_tutorial/utils.py @@ -78,7 +78,7 @@ def prepare_tensorrt_model(root_dir, weights_path, trt_model_name="model_trt.ts" model=model, precision="fp16", input_shape=[1, 1, 96, 96, 96], - dynamic_batchsize=[1, 1, 1], + dynamic_batchsize=[1, 4, 4], use_trace=True, verify=False, ) diff --git a/runner.sh b/runner.sh index 07c9c07d7b..964c37b6d5 100755 --- a/runner.sh +++ b/runner.sh @@ -70,6 +70,7 @@ doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TCIA_PROSTATEx_Pros doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_functional.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_compose.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" TensorRT_inference_acceleration.ipynb) +doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" fast_inference_tutorial.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" lazy_resampling_benchmark.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" modular_patch_inferer.ipynb) doesnt_contain_max_epochs=("${doesnt_contain_max_epochs[@]}" GDS_dataset.ipynb) @@ -117,6 +118,7 @@ skip_run_papermill=("${skip_run_papermill[@]}" .*swinunetr_finetune*) skip_run_papermill=("${skip_run_papermill[@]}" .*active_learning*) skip_run_papermill=("${skip_run_papermill[@]}" .*transform_visualization*) # https://github.com/Project-MONAI/tutorials/issues/1155 skip_run_papermill=("${skip_run_papermill[@]}" .*TensorRT_inference_acceleration*) +skip_run_papermill=("${skip_run_papermill[@]}" .*fast_inference_tutorial*) skip_run_papermill=("${skip_run_papermill[@]}" .*mednist_classifier_ray*) # https://github.com/Project-MONAI/tutorials/issues/1307 skip_run_papermill=("${skip_run_papermill[@]}" .*TorchIO_MONAI_PyTorch_Lightning*) # https://github.com/Project-MONAI/tutorials/issues/1324 skip_run_papermill=("${skip_run_papermill[@]}" .*GDS_dataset*) # https://github.com/Project-MONAI/tutorials/issues/1324 From 82daa3264747ab23db52fc361f3ca2d13d390773 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat, 8 Mar 2025 03:57:49 +0000 Subject: [PATCH 08/11] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../fast_inference_tutorial.ipynb | 24 ++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 9ddc17d7dc..b75ba47cd4 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -370,7 +370,7 @@ " total_time_dict = {}\n", " roi_size = (96, 96, 96)\n", " sw_batch_size = 4\n", - " \n", + "\n", " for idx, sample in enumerate(data_list[:10]):\n", " start = timer()\n", " data = infer_transforms({\"image\": sample})\n", @@ -474,7 +474,9 @@ " all_df = pd.merge(all_df, df, on=\"file_name\", how=\"left\")\n", "\n", "# for each file, add it's size\n", - "all_df[\"file_size\"] = all_df[\"file_name\"].apply(lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x)))\n", + "all_df[\"file_size\"] = all_df[\"file_name\"].apply(\n", + " lambda x: os.path.getsize(os.path.join(root_dir, \"Task03_Liver\", \"imagesTs_nii\", x))\n", + ")\n", "# sort by file size\n", "all_df = all_df.sort_values(by=\"file_size\", ascending=True)\n", "# convert file size to MB\n", @@ -538,8 +540,14 @@ ], "source": [ "print(\"TensorRT Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_time\"]) / total_time[\"original_time\"])\n", - "print(\"TensorRT + GPU Transforms Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_gpu_transforms_time\"]) / total_time[\"original_time\"])\n", - "print(\"TensorRT + GDS + GPU Transforms Improvement: \", (total_time[\"original_time\"] - total_time[\"trt_gds_gpu_transforms_time\"]) / total_time[\"original_time\"])" + "print(\n", + " \"TensorRT + GPU Transforms Improvement: \",\n", + " (total_time[\"original_time\"] - total_time[\"trt_gpu_transforms_time\"]) / total_time[\"original_time\"],\n", + ")\n", + "print(\n", + " \"TensorRT + GDS + GPU Transforms Improvement: \",\n", + " (total_time[\"original_time\"] - total_time[\"trt_gds_gpu_transforms_time\"]) / total_time[\"original_time\"],\n", + ")" ] }, { @@ -562,10 +570,10 @@ "total_time.index = [\"pytorch_model\", \"TensorRT\", \"TensorRT_GPU_Transform\", \"TensorRT_GPU_Transform_GDS\"]\n", "\n", "plt.figure(figsize=(10, 6))\n", - "total_time.plot(kind='bar', color=['skyblue', 'orange', 'green', 'red'])\n", - "plt.title('Total Inference Time for Each Benchmark Type')\n", - "plt.xlabel('Benchmark Type')\n", - "plt.ylabel('Total Time (seconds)')\n", + "total_time.plot(kind=\"bar\", color=[\"skyblue\", \"orange\", \"green\", \"red\"])\n", + "plt.title(\"Total Inference Time for Each Benchmark Type\")\n", + "plt.xlabel(\"Benchmark Type\")\n", + "plt.ylabel(\"Total Time (seconds)\")\n", "plt.xticks(rotation=45)\n", "plt.tight_layout()\n", "plt.show()" From 6d650ef59f07c9b9439166b71b13671db00d3a16 Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Sat, 8 Mar 2025 04:23:45 +0000 Subject: [PATCH 09/11] fix pep8 Signed-off-by: Yiheng Wang --- .../fast_inference_tutorial.ipynb | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index b75ba47cd4..204c693ea2 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -42,13 +42,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Install environment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ + "## Setup environment\n", + "\n", "Loading data directly from disk to GPU memory requires the `kvikio` library. In addition, this tutorial requires many other dependencies such as `monai`, `torch`, `torch_tensorrt`, `numpy`, `ignite`, `pandas`, `matplotlib`, etc. We recommend using the [MONAI Docker](https://docs.monai.io/en/latest/installation.html#from-dockerhub) image to run this tutorial, which includes pre-configured dependencies and allows you to skip manual installation.\n", "\n", "If not using MONAI Docker, install `kvikio` using one of these methods:\n", @@ -113,10 +108,10 @@ "from monai.inferers import sliding_window_inference\n", "from monai.networks.nets import SegResNet\n", "import matplotlib.pyplot as plt\n", - "import torch\n", "import gc\n", "import pandas as pd\n", "from timeit import default_timer as timer\n", + "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", "\n", "print_config()" ] @@ -284,8 +279,6 @@ } ], "source": [ - "from utils import prepare_test_datalist, prepare_model_weights, prepare_tensorrt_model\n", - "\n", "root_dir = \".\"\n", "torch.backends.cudnn.benchmark = True\n", "torch_tensorrt.runtime.set_multi_device_safe_mode(True)\n", @@ -465,7 +458,7 @@ "outputs": [], "source": [ "# collect benchmark results\n", - "all_df = pd.read_csv(os.path.join(root_dir, f\"time_original.csv\"))\n", + "all_df = pd.read_csv(os.path.join(root_dir, \"time_original.csv\"))\n", "all_df.columns = [\"file_name\", \"original_time\"]\n", "\n", "for benchmark_type in [\"trt\", \"trt_gpu_transforms\", \"trt_gds_gpu_transforms\"]:\n", From bcdfe9584a57f181679c5c42f88140b3c1b67caf Mon Sep 17 00:00:00 2001 From: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com> Date: Thu, 20 Mar 2025 11:41:34 +0800 Subject: [PATCH 10/11] Update acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com> Signed-off-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com> --- .../fast_inference_tutorial/fast_inference_tutorial.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 204c693ea2..36a1aa70a4 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -129,7 +129,7 @@ "source": [ "### 1. TensorRT Inference\n", "\n", - "`monai.networks.utils.convert_to_trt` is a function that converts a PyTorch model to a TensorRT engine-based TorchScript model. Except the loading method (need to use `torch.jit.load` to load the model), the usage of the converted TorchScriptmodel is the same as the original model.\n", + "`monai.networks.utils.convert_to_trt` is a function that converts a PyTorch model to a TensorRT engine-based TorchScript model. Except the loading method (need to use `torch.jit.load` to load the model), the usage of the converted TorchScript model is the same as the original model.\n", "\n", "`monai.data.torchscript_utils.save_net_with_metadata` is a function that saves the converted TorchScript model and its metadata.\n", "\n", From 1dccf638537ff940593efdff9bbcab6bff951773 Mon Sep 17 00:00:00 2001 From: Yiheng Wang Date: Mon, 24 Mar 2025 08:26:02 +0000 Subject: [PATCH 11/11] update doc Signed-off-by: Yiheng Wang --- .../fast_inference_tutorial.ipynb | 45 ++++++++++++------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb index 36a1aa70a4..ea0f398c7a 100644 --- a/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb +++ b/acceleration/fast_inference_tutorial/fast_inference_tutorial.ipynb @@ -203,7 +203,7 @@ "loader = LoadImaged(keys=\"image\", reader=\"NibabelReader\", to_gpu=True)\n", "```\n", "\n", - "Please note that only NIfTI (.nii, for compressed \".nii.gz\" files, this feature also supports but the acceleration is not significant) and DICOM (.dcm) files are supported for direct GPU data loading.\n" + "Please note that only NIfTI (`.nii`, for compressed `.nii.gz` files, this feature also supports but the acceleration is not guaranteed) and DICOM (`.dcm`) files are supported for direct GPU data loading.\n" ] }, { @@ -265,27 +265,15 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Test data already exists at ./Task03_Liver/imagesTs_nii\n", - "Weights already exists at ./model.pt\n", - "TensorRT model already exists at ./model_trt.ts\n" - ] - } - ], + "outputs": [], "source": [ "root_dir = \".\"\n", "torch.backends.cudnn.benchmark = True\n", "torch_tensorrt.runtime.set_multi_device_safe_mode(True)\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "train_files = prepare_test_datalist(root_dir)\n", - "# since the dataset is too large, the smallest 31 files are used for warm up (1 file) and benchmarking (30 files)\n", - "train_files = sorted(train_files, key=lambda x: os.path.getsize(x), reverse=False)[:31]\n", "weights_path = prepare_model_weights(root_dir=root_dir, bundle_name=\"wholeBody_ct_segmentation\")\n", "trt_model_name = \"model_trt.ts\"\n", "trt_model_path = prepare_tensorrt_model(root_dir, weights_path, trt_model_name)" @@ -609,13 +597,36 @@ "plt.legend()\n", "plt.show()" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Limitations\n", + "\n", + "Although the optimizations have shown significant improvements in inference time, there are still some limitations to consider:\n", + "\n", + "1. **TensorRT**: \n", + " - **Model Compatibility**: Not all models are compatible with TensorRT. Models with unsupported layers or operations may not benefit from TensorRT acceleration.\n", + " - **Batch Size**: TensorRT is optimized for larger batch sizes. For very small batch sizes, the overhead of conversion and execution might outweigh the performance gains.\n", + " - **Precision**: While using lower precision (e.g., FP16) can speed up inference, it may lead to a loss in model accuracy, which is critical in medical imaging applications.\n", + "\n", + "2. **GPU-Based Preprocessing**:\n", + " - **Memory Usage**: The GPU-based preprocessing requires additional GPU memory. This can be a limitation if the available GPU memory is limited.\n", + "\n", + "3. **GPU Direct Storage (GDS)**:\n", + " - **File Format Support**: Currently, only specific file formats like NIfTI (for compressed `.nii.gz` NIFTI files, this feature also supports but the acceleration is not guaranteed) and DICOM are supported for direct GPU data loading. Other formats may not benefit from this feature.\n", + " - **Small File Acceleration**: For small files, the overhead of conversion and execution might outweigh the performance gains.\n", + "\n", + "By understanding these limitations, users can better assess when and how to apply these acceleration features effectively in their workflows." + ] } ], "metadata": { "kernelspec": { - "display_name": "kvikio_env", + "display_name": "monai_tutorial", "language": "python", - "name": "kvikio_env" + "name": "python3" }, "language_info": { "codemirror_mode": {