Skip to content

Commit 8426b7a

Browse files
Merge branch 'main' into skipROCmTest
2 parents fff25bd + f6f3322 commit 8426b7a

File tree

116 files changed

+6566
-4419
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+6566
-4419
lines changed

.github/workflows/build-wheels_m1.yml

+31
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,34 @@ jobs:
4141
runner-type: macos-m1-stable
4242
smoke-test-script: test/smoke_test.py
4343
trigger-event: ${{ github.event_name }}
44+
notify:
45+
runs-on: ubuntu-latest
46+
name: Email notification
47+
needs: [generate-matrix, build]
48+
if: failure() && github.event_name == 'schedule'
49+
steps:
50+
- uses: dawidd6/action-send-mail@v4
51+
with:
52+
server_address: smtp.gmail.com
53+
server_port: 465
54+
username: torchao.notify
55+
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
56+
57+
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
58+
subject: Scheduled Build Failure for TorchAO
59+
body: |
60+
Build Failure Notification for TorchAO
61+
A failure occurred in the Build Linux Wheels workflow.
62+
Run Details:
63+
- Workflow: ${{ github.workflow }}
64+
- Run Type: ${{ github.event_name }}
65+
- Repository: ${{ github.repository }}
66+
- Branch/PR: ${{ github.ref }}
67+
- Commit: ${{ github.sha }}
68+
You can view the full run details here:
69+
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
70+
Error Information:
71+
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
72+
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}
73+
74+
This is an automated notification. Please check the GitHub Actions page for more details about the failure.

.github/workflows/build_wheels_aarch64_linux.yml

+33-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ jobs:
2929
test-infra-repository: pytorch/test-infra
3030
test-infra-ref: main
3131
with-cuda: disable
32-
32+
# please note: excluding 3.13t for aarch64 builds for now
33+
python-versions: '["3.9", "3.10", "3.11", "3.12", "3.13"]'
3334
build:
3435
needs: generate-matrix
3536
permissions:
@@ -53,3 +54,34 @@ jobs:
5354
setup-miniconda: false
5455
secrets:
5556
PYPI_API_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
57+
notify:
58+
runs-on: ubuntu-latest
59+
name: Email notification
60+
needs: [generate-matrix, build]
61+
if: failure() && github.event_name == 'schedule'
62+
steps:
63+
- uses: dawidd6/action-send-mail@v4
64+
with:
65+
server_address: smtp.gmail.com
66+
server_port: 465
67+
username: torchao.notify
68+
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
69+
70+
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
71+
subject: Scheduled Build Failure for TorchAO
72+
body: |
73+
Build Failure Notification for TorchAO
74+
A failure occurred in the Build AARCH64 Wheels workflow.
75+
Run Details:
76+
- Workflow: ${{ github.workflow }}
77+
- Run Type: ${{ github.event_name }}
78+
- Repository: ${{ github.repository }}
79+
- Branch/PR: ${{ github.ref }}
80+
- Commit: ${{ github.sha }}
81+
You can view the full run details here:
82+
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
83+
Error Information:
84+
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
85+
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}
86+
87+
This is an automated notification. Please check the GitHub Actions page for more details about the failure.

.github/workflows/build_wheels_linux.yml

+4-2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ jobs:
3030
with-cuda: enable
3131
with-rocm: enable
3232
with-xpu: enable
33+
# please note: excluding 3.13t for aarch64 builds for now
34+
python-versions: '["3.9", "3.10", "3.11", "3.12", "3.13"]'
3335

3436
build:
3537
needs: generate-matrix
@@ -70,7 +72,7 @@ jobs:
7072
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
7173
7274
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
73-
subject: breakbutterflyScheduled Build Failure for TorchAO
75+
subject: Scheduled Build Failure for TorchAO
7476
body: |
7577
Build Failure Notification for TorchAO
7678
@@ -89,5 +91,5 @@ jobs:
8991
Error Information:
9092
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
9193
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}
92-
94+
9395
This is an automated notification. Please check the GitHub Actions page for more details about the failure.

.github/workflows/build_wheels_windows.yml

+35
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,38 @@ jobs:
6060
package-name: ${{ matrix.package-name }}
6161
smoke-test-script: ${{ matrix.smoke-test-script }}
6262
trigger-event: ${{ github.event_name }}
63+
notify:
64+
runs-on: ubuntu-latest
65+
name: Email notification
66+
needs: [generate-matrix, build]
67+
if: failure() && github.event_name == 'schedule'
68+
steps:
69+
- uses: dawidd6/action-send-mail@v4
70+
with:
71+
server_address: smtp.gmail.com
72+
server_port: 465
73+
username: torchao.notify
74+
password: ${{ secrets.TORCHAO_NOTIFY_PASSWORD }}
75+
76+
to: ${{ secrets.TORCHAO_NOTIFY_RECIPIENT }}
77+
subject: Scheduled Build Failure for TorchAO
78+
body: |
79+
Build Failure Notification for TorchAO
80+
81+
A failure occurred in the Build Windows Wheels workflow.
82+
83+
Run Details:
84+
- Workflow: ${{ github.workflow }}
85+
- Run Type: ${{ github.event_name }}
86+
- Repository: ${{ github.repository }}
87+
- Branch/PR: ${{ github.ref }}
88+
- Commit: ${{ github.sha }}
89+
90+
You can view the full run details here:
91+
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
92+
93+
Error Information:
94+
${{ needs.generate-matrix.result == 'failure' && 'Matrix generation failed' || '' }}
95+
${{ needs.build.result == 'failure' && 'Build job failed' || '' }}
96+
97+
This is an automated notification. Please check the GitHub Actions page for more details about the failure.

.github/workflows/torchao_experimental_test.yml

+18-2
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,24 @@ jobs:
3535
conda activate venv
3636
pip install --extra-index-url "https://download.pytorch.org/whl/nightly/cpu" torch=="2.6.0.dev20250104"
3737
pip install numpy
38+
pip install pytest
3839
USE_CPP=1 pip install .
39-
- name: Run tests
40+
- name: Run python tests
4041
run: |
4142
conda activate venv
42-
python torchao/experimental/tests/test_packed_linear_int8_dynamic_activation_intx_weight_layout.py
43+
pytest torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py
44+
python torchao/experimental/tests/test_embedding_xbit_quantizer.py
45+
- name: Run kernels/cpu/aarch64/tests
46+
run: |
47+
conda activate venv
48+
pushd torchao/experimental/kernels/cpu/aarch64/tests
49+
sh build_and_run_tests.sh
50+
rm -rf /tmp/cmake-out
51+
popd
52+
- name: Run torchao/experimental/ops/tests
53+
run: |
54+
conda activate venv
55+
pushd torchao/experimental/ops/tests
56+
sh build_and_run_tests.sh
57+
rm -rf /tmp/cmake-out
58+
popd

README.md

+13-13
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,16 @@ For inference, we have the option of
2929
```python
3030
from torchao.quantization.quant_api import (
3131
quantize_,
32-
int8_dynamic_activation_int8_weight,
33-
int4_weight_only,
34-
int8_weight_only
32+
Int8DynamicActivationInt8WeightConfig,
33+
Int4WeightOnlyConfig,
34+
Int8WeightOnlyConfig
3535
)
36-
quantize_(m, int4_weight_only())
36+
quantize_(m, Int4WeightOnlyConfig())
3737
```
3838

39-
For gpt-fast `int4_weight_only()` is the best option at bs=1 as it **2x the tok/s and reduces the VRAM requirements by about 65%** over a torch.compiled baseline.
39+
For gpt-fast `Int4WeightOnlyConfig()` is the best option at bs=1 as it **2x the tok/s and reduces the VRAM requirements by about 65%** over a torch.compiled baseline.
4040

41-
If you don't have enough VRAM to quantize your entire model on GPU and you find CPU quantization to be too slow then you can use the device argument like so `quantize_(model, int8_weight_only(), device="cuda")` which will send and quantize each layer individually to your GPU.
41+
If you don't have enough VRAM to quantize your entire model on GPU and you find CPU quantization to be too slow then you can use the device argument like so `quantize_(model, Int8WeightOnlyConfig(), device="cuda")` which will send and quantize each layer individually to your GPU.
4242

4343
If you see slowdowns with any of these techniques or you're unsure which option to use, consider using [autoquant](./torchao/quantization/README.md#autoquantization) which will automatically profile layers and pick the best way to quantize each layer.
4444

@@ -63,27 +63,27 @@ Post-training quantization can result in a fast and compact model, but may also
6363
```python
6464
from torchao.quantization import (
6565
quantize_,
66-
int8_dynamic_activation_int4_weight,
66+
Int8DynamicActivationInt4WeightConfig,
6767
)
6868
from torchao.quantization.qat import (
6969
FakeQuantizeConfig,
70-
from_intx_quantization_aware_training,
71-
intx_quantization_aware_training,
70+
FromIntXQuantizationAwareTrainingConfig,
71+
IntXQuantizationAwareTrainingConfig,
7272
)
7373

7474
# Insert fake quantization
7575
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
7676
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
7777
quantize_(
7878
my_model,
79-
intx_quantization_aware_training(activation_config, weight_config),
79+
IntXQuantizationAwareTrainingConfig(activation_config, weight_config),
8080
)
8181

8282
# Run training... (not shown)
8383

8484
# Convert fake quantization to actual quantized operations
85-
quantize_(my_model, from_intx_quantization_aware_training())
86-
quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32))
85+
quantize_(my_model, FromIntXQuantizationAwareTrainingConfig())
86+
quantize_(my_model, Int8DynamicActivationInt4WeightConfig(group_size=32))
8787
```
8888

8989
### Float8
@@ -139,7 +139,7 @@ The best example we have combining the composability of lower bit dtype with com
139139

140140
We've added support for authoring and releasing [custom ops](./torchao/csrc/) that do not graph break with `torch.compile()` so if you love writing kernels but hate packaging them so they work all operating systems and cuda versions, we'd love to accept contributions for your custom ops. We have a few examples you can follow
141141

142-
1. [fp6](torchao/dtypes/floatx) for 2x faster inference over fp16 with an easy to use API `quantize_(model, fpx_weight_only(3, 2))`
142+
1. [fp6](torchao/dtypes/floatx) for 2x faster inference over fp16 with an easy to use API `quantize_(model, FPXWeightOnlyConfig(3, 2))`
143143
2. [2:4 Sparse Marlin GEMM](https://github.com/pytorch/ao/pull/733) 2x speedups for FP16xINT4 kernels even at batch sizes up to 256
144144
3. [int4 tinygemm unpacker](https://github.com/pytorch/ao/pull/415) which makes it easier to switch quantized backends for inference
145145

benchmarks/float8/float8_roofline.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
ScalingType,
6464
convert_to_float8_training,
6565
)
66-
from torchao.float8.config import Float8LinearRecipeName, recipe_name_to_linear_config
6766
from torchao.float8.roofline_utils import (
6867
get_float8_mem_sympy,
6968
get_gemm_time_sympy,
@@ -349,7 +348,7 @@ def run(
349348

350349
# get the float8 dynamic axiswise scaling gpu kernel time
351350
torch._dynamo.reset()
352-
config = recipe_name_to_linear_config(Float8LinearRecipeName.ALL_AXISWISE)
351+
config = Float8LinearConfig.from_recipe_name("rowwise")
353352
m_fp8_dyn_axs = convert_to_float8_training(copy.deepcopy(m_orig), config=config)
354353
m_fp8_dyn_axs = torch.compile(m_fp8_dyn_axs)
355354
fp8_dyn_axs_time_actual_s = get_gpu_kernel_time(m_fp8_dyn_axs, x)
@@ -358,7 +357,7 @@ def run(
358357
# TODO(future PR): enable below once basic performance issues
359358
# are fixed
360359
# torch._dynamo.reset()
361-
# config = recipe_name_to_linear_config(Float8LinearRecipeName.LW_AXISWISE_WITH_GW_HP)
360+
# config = Float8LinearConfig.from_recipe_name("rowwise_with_gw_hp")
362361
# m_fp8_lw = convert_to_float8_training(m_orig, config=config)
363362
# m_fp8_lw = torch.compile(m_fp8_lw)
364363
# fp8_lw_time_actual_s = get_gpu_kernel_time(m_fp8_lw, x)

benchmarks/float8/profile_linear_float8.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,8 @@
3939

4040
from torchao.float8 import _prototype_register_float8_delayed_scaling_inductor_passes
4141
from torchao.float8.config import (
42-
Float8LinearRecipeName,
42+
Float8LinearConfig,
4343
ScalingType,
44-
recipe_name_to_linear_config,
4544
)
4645
from torchao.float8.float8_linear_utils import (
4746
convert_to_float8_training,
@@ -311,8 +310,7 @@ def main(
311310
emulate=False,
312311
)
313312
elif recipe_name is not None:
314-
recipe_name = Float8LinearRecipeName(recipe_name)
315-
config = recipe_name_to_linear_config(recipe_name)
313+
config = Float8LinearConfig.from_recipe_name(recipe_name)
316314

317315
scaling_repr = "_".join(
318316
[

examples/sam2_amg_server/README.md

+26
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
# Reproducing experiments locally
2+
3+
You can simply run `python reproduce_experiments.py <path/to/image_paths_file> <path/to/output_folder>`
4+
5+
`image_paths_file` needs to be a flat list of paths to images, for example
6+
7+
```
8+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_044979/00349.jpg
9+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_006751/00204.jpg
10+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_053118/00239.jpg
11+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_053391/00517.jpg
12+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_018487/00001.jpg
13+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_028552/00153.jpg
14+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_013729/00103.jpg
15+
/home/$USER/data/sav_val/JPEGImages_24fps/sav_014662/00339.jpg
16+
```
17+
18+
or whichever other files you'd like to use for study. For example you may consider the Segment Anything Video (SA-V) [Dataset](https://github.com/facebookresearch/sam2/tree/main/sav_dataset#download-the-dataset).
19+
20+
The experimental results will then be saved under `output_folder` in result.csv
21+
22+
# Reproducing experiments on Modal
23+
24+
For this you can run `modal_experiments.sh` after, but you'll want to experiments locally first to produce the meta annotations and exported ahead-of-time compiled binaries.
25+
26+
# Using the server locally
127
## Example curl command
228
```
329
curl -X POST http://127.0.0.1:5000/upload -F 'image=@/path/to/file.jpg' --output path/to/output.png

0 commit comments

Comments
 (0)