Skip to content

Commit be17881

Browse files
authored
Update TensorRT-LLM (#2582)
1 parent b171e87 commit be17881

File tree

267 files changed

+4505
-4898
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

267 files changed

+4505
-4898
lines changed

.github/workflows/auto-assign.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
- name: Checkout repository
1414
uses: actions/checkout@v2
1515

16-
- name: Get Assignee
16+
- name: Get assignee
1717
uses: actions/github-script@v6
1818
id: get-assignee
1919
with:

.github/workflows/module-owners.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,6 @@
99
"Performance": ["kaiyux", "jiahanc", "hypdeb"],
1010
"Lora/P-tuning":["byshiue", "Naveassaf"],
1111
"Disaggregated Serving":["Shixiaowei02", "joyang-nv", "chuangz0", "schetlur-nv"],
12-
"Documentation":["nv-guomingz", "mikemckiernan"]
12+
"Documentation":["nv-guomingz", "mikemckiernan"],
13+
"Windows":["pamelap-nvidia"]
1314
}

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ repos:
77
rev: v1.1.13
88
hooks:
99
- id: remove-crlf
10-
- repo: https://github.com/pre-commit/mirrors-yapf
11-
rev: v0.32.0
10+
- repo: https://github.com/google/yapf
11+
rev: v0.43.0
1212
hooks:
1313
- id: yapf
1414
- repo: https://github.com/pre-commit/pre-commit-hooks

README.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ TensorRT-LLM
88
[![python](https://img.shields.io/badge/python-3.12.3-green)](https://www.python.org/downloads/release/python-3123/)
99
[![cuda](https://img.shields.io/badge/cuda-12.6.3-green)](https://developer.nvidia.com/cuda-downloads)
1010
[![trt](https://img.shields.io/badge/TRT-10.7.0-green)](https://developer.nvidia.com/tensorrt)
11-
[![version](https://img.shields.io/badge/release-0.16.0.dev-green)](./tensorrt_llm/version.py)
11+
[![version](https://img.shields.io/badge/release-0.17.0.dev-green)](./tensorrt_llm/version.py)
1212
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
1313

1414
[Architecture](./docs/source/architecture/overview.md)   |   [Performance](./docs/source/performance/perf-overview.md)   |   [Examples](./examples/)   |   [Documentation](./docs/source/)   |   [Roadmap](https://docs.google.com/presentation/d/1gycPmtdh7uUcH6laOvW65Dbp9F1McUkGDIcAyjicBZs/edit?usp=sharing)
@@ -17,12 +17,31 @@ TensorRT-LLM
1717
<div align="left">
1818

1919
## Latest News
20-
* [2024/11/19] Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs
21-
[➡️ link](https://developer.nvidia.com/blog/llama-3-2-full-stack-optimizations-unlock-high-performance-on-nvidia-gpus/?ncid=so-link-721194)
20+
* [2024/12/10]Llama 3.3 70B from AI at Meta is accelerated by TensorRT-LLM. 🌟 State-of-the-art model on par with Llama 3.1 405B for reasoning, math, instruction following and tool use. Explore the preview
21+
[➡️ link](https://build.nvidia.com/meta/llama-3_3-70b-instruct)
2222
<div align="center">
23-
<img src="https://developer-blogs.nvidia.com/wp-content/uploads/2024/11/three-llamas-holding-number-10-signs-1.jpg" width="50%">
23+
<img src="https://media.licdn.com/dms/image/v2/D4E10AQEMmE0xCJ6QvQ/image-shrink_800/image-shrink_800/0/1733853602424?e=1734912000&v=beta&t=mjW-P7skGnxGSgMx07IWczyYYC_05RYypB2mEmbgqZQ" width="50%">
2424
<div align="left">
2525

26+
* [2024/12/03] 🌟 Boost your AI hashtag#inference throughput by up to 3.6x. We now support speculative decoding and tripling token throughput with our NVIDIA TensorRT-LLM. Perfect for your generative AI apps. ⚡Learn how in this technical deep dive
27+
[➡️ link](https://nvda.ws/3ZCZTzD)
28+
29+
* [2024/12/02] Working on deploying ONNX models for performance-critical applications? Try our NVIDIA Nsight Deep Learning Designer ⚡ A user-friendly GUI and tight integration with NVIDIA TensorRT that offers:
30+
✅ Intuitive visualization of ONNX model graphs
31+
✅ Quick tweaking of model architecture and parameters
32+
✅ Detailed performance profiling with either ORT or TensorRT
33+
✅ Easy building of TensorRT engines
34+
[➡️ link](https://developer.nvidia.com/nsight-dl-designer?ncid=so-link-485689&linkId=100000315016072)
35+
36+
* [2024/11/26] 📣 Introducing TensorRT-LLM for Jetson AGX Orin, making it even easier to deploy on Jetson AGX Orin with initial support in JetPack 6.1 via the v0.12.0-jetson branch of the TensorRT-LLM repo. ✅ Pre-compiled TensorRT-LLM wheels & containers for easy integration ✅ Comprehensive guides & docs to get you started
37+
[➡️ link](https://forums.developer.nvidia.com/t/tensorrt-llm-for-jetson/313227?linkId=100000312718869)
38+
39+
* [2024/11/21] NVIDIA TensorRT-LLM Multiblock Attention Boosts Throughput by More Than 3x for Long Sequence Lengths on NVIDIA HGX H200
40+
[➡️ link](https://developer.nvidia.com/blog/nvidia-tensorrt-llm-multiblock-attention-boosts-throughput-by-more-than-3x-for-long-sequence-lengths-on-nvidia-hgx-h200/)
41+
42+
* [2024/11/19] Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs
43+
[➡️ link](https://developer.nvidia.com/blog/llama-3-2-full-stack-optimizations-unlock-high-performance-on-nvidia-gpus/?ncid=so-link-721194)
44+
2645
* [2024/11/09] 🚀🚀🚀 3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
2746
[➡️ link](https://developer.nvidia.com/blog/3x-faster-allreduce-with-nvswitch-and-tensorrt-llm-multishot/)
2847

@@ -46,6 +65,9 @@ TensorRT-LLM
4665
* [2024/10/07] 🚀🚀🚀Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries
4766
[➡️ link](https://developer.nvidia.com/blog/optimizing-microsoft-bing-visual-search-with-nvidia-accelerated-libraries/)
4867

68+
<details close>
69+
<summary>Previous News</summary>
70+
4971
* [2024/09/29] 🌟 AI at Meta PyTorch + TensorRT v2.4 🌟 ⚡TensorRT 10.1 ⚡PyTorch 2.4 ⚡CUDA 12.4 ⚡Python 3.12
5072
[➡️ link](https://github.com/pytorch/TensorRT/releases/tag/v2.4.0)
5173

@@ -61,8 +83,6 @@ TensorRT-LLM
6183
* [2024/09/04] 🏎️🏎️🏎️ Best Practices for Tuning TensorRT-LLM for Optimal Serving with BentoML
6284
[➡️ link](https://www.bentoml.com/blog/tuning-tensor-rt-llm-for-optimal-serving-with-bentoml)
6385

64-
<details close>
65-
<summary>Previous News</summary>
6686

6787
* [2024/08/20] 🏎️SDXL with #TensorRT Model Optimizer ⏱️⚡ 🏁 cache diffusion 🏁 quantization aware training 🏁 QLoRA 🏁 #Python 3.12
6888
[➡️ link](https://developer.nvidia.com/blog/nvidia-tensorrt-model-optimizer-v0-15-boosts-inference-performance-and-expands-model-support/)

0 commit comments

Comments
 (0)