Skip to content

Commit ca581d8

Browse files
authored
Merge branch 'site' into accelerators_table
2 parents 52c86d0 + 3fd3b6b commit ca581d8

8 files changed

+388
-70
lines changed

.github/workflows/update-quick-start-module.yml

+4-22
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,6 @@ jobs:
3232
package-type: all
3333
os: windows
3434
channel: "nightly"
35-
macos-nightly-matrix:
36-
uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
37-
with:
38-
package-type: all
39-
os: macos
40-
channel: "nightly"
4135
macos-arm64-nightly-matrix:
4236
uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
4337
with:
@@ -58,13 +52,6 @@ jobs:
5852
package-type: all
5953
os: windows
6054
channel: "release"
61-
macos-release-matrix:
62-
needs: [macos-nightly-matrix]
63-
uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
64-
with:
65-
package-type: all
66-
os: macos
67-
channel: "release"
6855
macos-arm64-release-matrix:
6956
needs: [macos-arm64-nightly-matrix]
7057
uses: pytorch/test-infra/.github/workflows/generate_binary_build_matrix.yml@main
@@ -74,9 +61,8 @@ jobs:
7461
channel: "release"
7562

7663
update-quick-start:
77-
needs: [linux-nightly-matrix, windows-nightly-matrix, macos-nightly-matrix,
78-
macos-arm64-nightly-matrix, linux-release-matrix, windows-release-matrix,
79-
macos-release-matrix, macos-arm64-release-matrix]
64+
needs: [linux-nightly-matrix, windows-nightly-matrix, macos-arm64-nightly-matrix,
65+
linux-release-matrix, windows-release-matrix, macos-arm64-release-matrix]
8066
runs-on: "ubuntu-20.04"
8167
environment: pytorchbot-env
8268
steps:
@@ -92,22 +78,18 @@ jobs:
9278
env:
9379
LINUX_NIGHTLY_MATRIX: ${{ needs.linux-nightly-matrix.outputs.matrix }}
9480
WINDOWS_NIGHTLY_MATRIX: ${{ needs.windows-nightly-matrix.outputs.matrix }}
95-
MACOS_NIGHTLY_MATRIX: ${{ needs.macos-nightly-matrix.outputs.matrix }}
96-
MACOS_ARM64_NIGHTLY_MATRIX: ${{ needs.macos-arm64-nightly-matrix.outputs.matrix }}
81+
MACOS_NIGHTLY_MATRIX: ${{ needs.macos-arm64-nightly-matrix.outputs.matrix }}
9782
LINUX_RELEASE_MATRIX: ${{ needs.linux-release-matrix.outputs.matrix }}
9883
WINDOWS_RELEASE_MATRIX: ${{ needs.windows-release-matrix.outputs.matrix }}
99-
MACOS_RELEASE_MATRIX: ${{ needs.macos-release-matrix.outputs.matrix }}
100-
MACOS_ARM64_RELEASE_MATRIX: ${{ needs.macos-arm64-release-matrix.outputs.matrix }}
84+
MACOS_RELEASE_MATRIX: ${{ needs.macos-arm64-release-matrix.outputs.matrix }}
10185
run: |
10286
set -ex
10387
printf '%s\n' "$LINUX_NIGHTLY_MATRIX" > linux_nightly_matrix.json
10488
printf '%s\n' "$WINDOWS_NIGHTLY_MATRIX" > windows_nightly_matrix.json
10589
printf '%s\n' "$MACOS_NIGHTLY_MATRIX" > macos_nightly_matrix.json
106-
printf '%s\n' "$MACOS_ARM64_NIGHTLY_MATRIX" > macos_arm64_nightly_matrix.json
10790
printf '%s\n' "$LINUX_RELEASE_MATRIX" > linux_release_matrix.json
10891
printf '%s\n' "$WINDOWS_RELEASE_MATRIX" > windows_release_matrix.json
10992
printf '%s\n' "$MACOS_RELEASE_MATRIX" > macos_release_matrix.json
110-
printf '%s\n' "$MACOS_ARM64_RELEASE_MATRIX" > macos_arm64_release_matrix.json
11193
python3 ./scripts/gen_quick_start_module.py --autogenerate > assets/quick-start-module.js
11294
rm *_matrix.json
11395
- name: Create Issue if failed

_get_started/previous-versions.md

+43
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,49 @@ your convenience.
1717

1818
## Commands for Versions >= 1.0.0
1919

20+
### v2.2.2
21+
22+
#### Conda
23+
24+
##### OSX
25+
26+
```
27+
# conda
28+
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 -c pytorch
29+
```
30+
31+
##### Linux and Windows
32+
33+
```
34+
# CUDA 11.8
35+
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
36+
# CUDA 12.1
37+
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia
38+
# CPU Only
39+
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 cpuonly -c pytorch
40+
```
41+
42+
#### Wheel
43+
44+
##### OSX
45+
46+
```
47+
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2
48+
```
49+
50+
##### Linux and Windows
51+
52+
```
53+
# ROCM 5.7 (Linux only)
54+
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/rocm5.7
55+
# CUDA 11.8
56+
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
57+
# CUDA 12.1
58+
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
59+
# CPU only
60+
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu
61+
```
62+
2063
### v2.2.1
2164

2265
#### Conda

_includes/quick_start_local.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<a href="{{ site.baseurl }}/get-started/previous-versions">install previous versions of PyTorch</a>. Note that LibTorch is only available for C++.
66
</p>
77

8-
<p><b>NOTE:</b> Latest PyTorch requires Python 3.8 or later. For more details, see Python section below.</p>
8+
<p><b>NOTE:</b> Latest PyTorch requires Python 3.8 or later.</p>
99

1010
<div class="row">
1111
<div class="col-md-3 headings">

_posts/2024-04-24-pytorch2-3.md

+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
layout: blog_detail
3+
title: "PyTorch 2.3 Release Blog"
4+
---
5+
6+
We are excited to announce the release of PyTorch® 2.3 ([release note](https://github.com/pytorch/pytorch/releases/tag/v2.3.0))! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks. Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training runs for 100B parameter models. As well, semi-structured sparsity implements semi-structured sparsity as a Tensor subclass, with observed speedups of up to 1.6 over dense matrix multiplication.
7+
8+
This release is composed of 3393 commits and 426 contributors since PyTorch 2.2. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.3. More information about how to get started with the PyTorch 2-series can be found at our [Getting Started](https://pytorch.org/get-started/pytorch-2.0/) page.
9+
10+
11+
<table class="table table-bordered">
12+
<tr>
13+
<td><strong>Beta</strong>
14+
</td>
15+
<td><strong>Prototype</strong>
16+
</td>
17+
<td><strong>Performance Improvements</strong>
18+
</td>
19+
</tr>
20+
<tr>
21+
<td>User-defined Triton kernels in torch.compile
22+
</td>
23+
<td>torch.export adds new API to specify dynamic_shapes
24+
</td>
25+
<td>Weight-Only-Quantization introduced into Inductor CPU backend
26+
</td>
27+
</tr>
28+
<tr>
29+
<td>Tensor parallelism within PyTorch Distributed
30+
</td>
31+
<td>Asynchronous checkpoint generation
32+
</td>
33+
<td>
34+
</td>
35+
</tr>
36+
<tr>
37+
<td>Support for semi-structured sparsity
38+
</td>
39+
<td>
40+
</td>
41+
<td>
42+
</td>
43+
</tr>
44+
</table>
45+
46+
47+
*To see a full list of public feature submissions click [here](https://docs.google.com/spreadsheets/d/1TzGkWuUMF1yTe88adz1dt2mzbIsZLd3PBasy588VWgk/edit?usp=sharing).
48+
49+
50+
## Beta Features
51+
52+
53+
### [Beta] Support for User-defined Triton kernels in _torch.compile_
54+
55+
Allows for PyTorch code that contains triton kernels to be executed natively using torch.compile. This enables users to migrate code containing triton kernels from eager PyTorch to _torch.compile_ without running into performance regressions or graph breaks. Native support also creates an opportunity for Torch Inductor to precompile the user-defined Triton kernel as well as better organize code around the Triton kernel allowing for further optimizations.
56+
57+
You can find more information about how to utilize user defined Triton kernels in torch.compile within [this tutorial](https://pytorch.org/tutorials/recipes/torch_compile_user_defined_triton_kernel_tutorial.html).
58+
59+
60+
### [Beta] Tensor Parallelism introduces more efficient ways to train LLMs
61+
62+
The Tensor Parallel API facilitates various tensor manipulations across GPUs/hosts and integrates with FSDP for 2D Parallelism (Tensor parallelism across devices + Data Parallelism across hosts). It also offers a low-level API for constructing higher-level Tensor parallel APIs. This API has been validated to support the training of transformer models with over 100 billion parameters.
63+
64+
You can find more information on how to utilize this within your workflows within [this tutorial](https://pytorch.org/tutorials/intermediate/TP_tutorial.html).
65+
66+
67+
### [Beta] Semi-structured sparsity provides users with a way to take advantage of accelerated sparse inference and memory savings
68+
69+
_torch.sparse.SparseSemiStructuredTensor_ implements semi-structured sparsity as a Tensor subclass, which have observed speedups of up to 1.6 over dense matrix multiplication.
70+
71+
In particular it adds:
72+
73+
74+
75+
* Additional support for quantization composability (mixed dtype, dequant fusion)
76+
* Updated cuSPARSELt and CUTLASS kernels
77+
* torch.compile support
78+
79+
You can find more information on how to take advantage of semi-structured sparsity [here](https://pytorch.org/tutorials/advanced/semi_structured_sparse.html).
80+
81+
82+
## Prototype Features
83+
84+
85+
### [PROTOTYPE] _torch.export_ adds new API to specify _dynamic_shapes_
86+
87+
You can now use _torch.export.Dim_ to better represent dynamic shapes by enabling developers to specify ranges (min and max values) that can be reused across different input dimensions that are constrained to be equal.
88+
89+
To learn more about _torch.export.Dim_ as well as how it can be used to express more interesting relationships (such as linear arithmetic expressions) check out the tutorial [here](https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html#constraints-dynamic-shapes).
90+
91+
92+
### [PROTOTYPE] Asynchronous checkpoint generation
93+
94+
Asynchronous checkpoint generation allows users to continue their training loops while checkpoints are being generated, essentially offloading much of the checkpointing cost.
95+
96+
You can find out how to utilize this within your own workflows with this [example](https://github.com/pytorch/pytorch/blob/release/2.3/torch/distributed/checkpoint/examples/async_checkpointing_example.py).
97+
98+
99+
## Performance Improvements
100+
101+
102+
### [PROTOTYPE] Weight-Only-Quantization introduced into Inductor CPU backend
103+
104+
PyTorch 2.3 enhances LLM inference performance on torch inductor CPU backend. The project [gpt-fast](https://github.com/pytorch-labs/gpt-fast) offers a simple and efficient PyTorch native acceleration for transformer text generation with _torch.compile_. Prior to 2.3 only CUDA devices were supported and this feature enables the CPU counterpart by providing highly optimized kernels for the int4 and int8 weight only quantization Linear.
105+
106+
For more information / how to utilize this feature please refer to the [gpt-fast README](https://github.com/pytorch-labs/gpt-fast#quantization).

0 commit comments

Comments
 (0)