You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+30-23
Original file line number
Diff line number
Diff line change
@@ -33,11 +33,12 @@ What is Transformer Engine?
33
33
.. overview-begin-marker-do-not-remove
34
34
35
35
Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including
36
-
using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower
37
-
memory utilization in both training and inference. TE provides a collection of highly optimized
38
-
building blocks for popular Transformer architectures and an automatic mixed precision-like API that
39
-
can be used seamlessly with your framework-specific code. TE also includes a framework agnostic
40
-
C++ API that can be integrated with other deep learning libraries to enable FP8 support for Transformers.
36
+
using 8-bit floating point (FP8) precision on Hopper, Ada, and Blackwell GPUs, to provide better
37
+
performance with lower memory utilization in both training and inference. TE provides a collection
38
+
of highly optimized building blocks for popular Transformer architectures and an automatic mixed
39
+
precision-like API that can be used seamlessly with your framework-specific code. TE also includes a
40
+
framework agnostic C++ API that can be integrated with other deep learning libraries to enable FP8
41
+
support for Transformers.
41
42
42
43
As the number of parameters in Transformer models continues to grow, training and inference for
43
44
architectures such as BERT, GPT and T5 become very memory and compute-intensive. Most deep learning
@@ -51,16 +52,16 @@ not available natively in frameworks today.
51
52
52
53
TE addresses the problem of FP8 support by providing APIs that integrate with popular Large Language
53
54
Model (LLM) libraries. It provides a Python API consisting of modules to easily build a Transformer
54
-
layer as well as a framework-agnostic library in C++ including structs and kernels needed for FP8 support.
55
-
Modules provided by TE internally maintain scaling factors and other values needed for FP8 training, greatly
56
-
simplifying mixed precision training for users.
55
+
layer as well as a framework-agnostic library in C++ including structs and kernels needed for FP8
56
+
support. Modules provided by TE internally maintain scaling factors and other values needed for FP8
57
+
training, greatly simplifying mixed precision training for users.
57
58
58
59
Highlights
59
60
==========
60
61
61
62
* Easy-to-use modules for building Transformer layers with FP8 support
62
63
* Optimizations (e.g. fused kernels) for Transformer models
63
-
* Support for FP8 on NVIDIA Hopperand NVIDIA Ada GPUs
64
+
* Support for FP8 on NVIDIA Hopper, Ada, and Blackwell GPUs
64
65
* Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
65
66
66
67
Examples
@@ -149,48 +150,54 @@ Installation
149
150
Pre-requisites
150
151
^^^^^^^^^^^^^^^^^^^^
151
152
* Linux x86_64
152
-
* CUDA 12.0+ for Hopper and CUDA 12.1+ for Ada
153
-
* NVIDIA Driver supporting CUDA 12.0 or later
154
-
* cuDNN 8.1 or later
155
-
* For fused attention, CUDA 12.1 or later, NVIDIA Driver supporting CUDA 12.1 or later, and cuDNN 8.9 or later.
153
+
* CUDA 12.1+ (CUDA 12.8+ for Blackwell)
154
+
* NVIDIA Driver supporting CUDA 12.1 or later
155
+
* cuDNN 9.3 or later
156
156
157
157
Docker
158
158
^^^^^^^^^^^^^^^^^^^^
159
159
160
160
The quickest way to get started with Transformer Engine is by using Docker images on
161
-
`NVIDIA GPU Cloud (NGC) Catalog <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_. For example to use the NGC PyTorch container interactively,
This will automatically detect if any supported deep learning frameworks are installed and build Transformer Engine support for them. To explicitly specify frameworks, set the environment variable NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch,paddle).
178
+
This will automatically detect if any supported deep learning frameworks are installed and build
179
+
Transformer Engine support for them. To explicitly specify frameworks, set the environment variable
180
+
NVTE_FRAMEWORK to a comma-separated list (e.g. NVTE_FRAMEWORK=jax,pytorch).
178
181
179
-
Alternatively, the package can be directly installed from `Transformer Engine's PyPI <https://pypi.org/project/transformer-engine/>`_, e.g.
182
+
Alternatively, the package can be directly installed from
183
+
`Transformer Engine's PyPI <https://pypi.org/project/transformer-engine/>`_, e.g.
180
184
181
185
.. code-block:: bash
182
186
183
-
pip install transformer_engine[pytorch]
187
+
pip3 install transformer_engine[pytorch]
184
188
185
-
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch,paddle]). Transformer Engine ships wheels for the core library as well as the PaddlePaddle extensions. Source distributions are shipped for the JAX and PyTorch extensions.
189
+
To obtain the necessary Python bindings for Transformer Engine, the frameworks needed must be
190
+
explicitly specified as extra dependencies in a comma-separated list (e.g. [jax,pytorch]).
191
+
Transformer Engine ships wheels for the core library. Source distributions are shipped for the JAX
192
+
and PyTorch extensions.
186
193
187
194
From source
188
195
^^^^^^^^^^^
189
196
`See the installation guide <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html#installation-from-source>`_.
190
197
191
198
Compiling with FlashAttention-2
192
199
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
193
-
Transformer Engine release v0.11.0 adds support for FlashAttention-2 in PyTorch for improved performance.
200
+
Transformer Engine release v0.11.0 added support for FlashAttention-2 in PyTorch for improved performance.
194
201
195
202
It is a known issue that FlashAttention-2 compilation is resource-intensive and requires a large amount of RAM (see `bug <https://github.com/Dao-AILab/flash-attention/issues/358>`_), which may lead to out of memory errors during the installation of Transformer Engine. Please try setting **MAX_JOBS=1** in the environment to circumvent the issue.
196
203
@@ -264,10 +271,10 @@ Transformer Engine has been integrated with popular LLM frameworks such as:
0 commit comments