Make SmoothQuant more General #2728

namgyu-youn · 2025-08-11T04:52:51Z

Summary:
Add SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow

Related Issue/PR: Make AWQ more general #2400 Convert SmoothQuant test to unittest #2659 (comment)
Fix typo in SmoothQuant README

Test Plan:
Qwen 3-8B (static/dynamic) with example.py and unittest (test_smoothquant.py)

Summary: - Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow Test Plan: - Qwen 3-8B with example.py and unittest - Additional test plans requirerd ETC - Fix typo in README.md for SmoothQuant

pytorch-bot · 2025-08-11T04:52:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2728

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit ccb7b84 with merge base 2eae09b ():

NEW FAILURES - The following jobs have failed:

Code Analysis with Ruff / build (3.9) (gh)
Process completed with exit code 1.
PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 5046b748cf485687ceccf3769dc6f77edaae7337d800bbd1c36d79c84ec9d1b6 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t e98c9b78bf4f1b786340a090acb6d9c44928844b3521f7455c5884854297c8a4 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t bfb565b976924e7dc5fd7016f0e92c13636c6a018e38cac0ffbe221cb7bcd5a4 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 6dd992ef32ad37e39d637fdbf089b166cf3a5fac862402a8d954a6a58f25183f /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 32003c84bea88fcc2af9f34c3c7895ebdfdc0f307822b4c7c22a6098863a9f99 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 6d21f7a7ee75629b9050948bf90fb893d651230e3ef749d26d1cef264223e331 /exec failed with exit code 2
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t 446a55919a57e54fe3503b7cc6bfe232bb64fc7e5af13a4c5548e133ad008e6a /exec failed with exit code 2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 0781d8245b19dbb35b32972b96d907aec57b64502dcc659602c8bb1cc91d03ca /exec failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2025-08-15T17:44:09Z

@jerryzh168 Could you please look into this PR? It was inspired by #2659 (comment) for more generalized SmoothQuant API.

jerryzh168 · 2025-08-15T17:49:18Z

Thanks @namgyu-youn this is a step towards that but not fully general yet, it seems to be a quick change to add it though, commented inline.

also it seems smoothquant is not very popular at the moment: https://huggingface.co/models?search=smoothquant, so I'd like to wait a bit before we invest more effort to it, let me know if you are interested to contribute more to torchao, we have many more higher priority issues that you can help with I think

jerryzh168 · 2025-08-15T17:51:55Z

test/prototype/test_smoothquant.py

@@ -82,14 +55,15 @@ def forward(self, x):
        test_data = torch.randn(2, 32, dtype=input_dtype, device=device)

        # Step 1: Setup quantized model with observer insertion and calibration
-        insert_smooth_quant_observer_(m, alpha, quant_mode)
+        config = SmoothQuantConfig(step="prepare", alpha=alpha, quant_mode=quant_mode)


it you want to really make it general, the quant_mode has to be changed to base_config, and we'll do a general quantization like this

ao/torchao/prototype/awq/api.py

Lines 104 to 106 in 2eae09b

base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)]

dummy_mod = DummyModule(observed_linear.weight * equalization_scale)

quant_mod = base_config_handler(dummy_mod, config.base_config)

Thanks, could you suggest which API can be here? int8_dynamic_activation_int4_weight and int8_dynamic_activation_int8_weight are in the options, and we can build more config tests.

jerryzh168 · 2025-08-15T17:53:03Z

torchao/prototype/smoothquant/api.py

    weight = weight.to(observed_linear.weight.dtype)
    block_size = (1, weight.size(1))
-    wei_zero_points = torch.zeros_like(w_scales, dtype=torch.int64)
+    wei_zero_points = torch.zeros_like(wei_scales, dtype=torch.int64)
+
    qw = to_affine_quantized_intx_static(


here this should be updated to

ao/torchao/prototype/awq/api.py

Lines 104 to 106 in 2eae09b

base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)]

dummy_mod = DummyModule(observed_linear.weight * equalization_scale)

quant_mod = base_config_handler(dummy_mod, config.base_config)

to make it truly general

Oh, I missed it. The Quantization API should be a choice, thanks!

namgyu-youn · 2025-08-15T18:57:29Z

Thanks @namgyu-youn this is a step towards that but not fully general yet, it seems to be a quick change to add it though, commented inline.

also it seems smoothquant is not very popular at the moment: https://huggingface.co/models?search=smoothquant, so I'd like to wait a bit before we invest more effort to it, let me know if you are interested to contribute more to torchao, we have many more higher priority issues that you can help with I think

Thanks for the kind info, and I truly love your team's work after reviewing TorchAO: CodeML @ ICML 2025.

The recently updated contribution guide could be a great choice for the next contribution, but personally I prefer the sparsity (pruning) module more. Unfortunately, I heard the main POC (@jcaip) is on vacation, making it hard for me to progress. The following are my recent activities related to the sparsity module:

Since Wanda was already introduced, I recently introduced Wanda++ at feat: RGS for wanda++ #2537.
Computation overhead was missing in your team's workshop (not certain because of my lack of knowledge), and opened issue at Missing benchmark for sparse24_sm90_sparsify overhead #2612
Also interested in Activation compression Accelerate activation sparsity with activation compression #1920, but I have to learn more about it.

If there is no huge progress for the sparsity module, quantization (new APIs or primitive ops) might be a next step. Let me know if there is a good-second-issue about it.

p.s. Could you please check #2644 ? It hasn't merged yet after being approved (no CI broken). Also, #2660 has been waiting for review (I am fine to close this because it is low-priority).

Make SmoothQuant more General

c482371

Summary: - Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow Test Plan: - Qwen 3-8B with example.py and unittest - Additional test plans requirerd ETC - Fix typo in README.md for SmoothQuant

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2025

refactor: use predefined ToyLinearModel

e16edc2

namgyu-youn marked this pull request as draft August 12, 2025 06:53

namgyu-youn added 2 commits August 12, 2025 16:51

fix incorrect parameters

5ec0dcf

add type hint for dataclass

2475ad1

namgyu-youn marked this pull request as ready for review August 12, 2025 08:01

Merge branch 'main' into refactor-smoothquant

ccb7b84

jerryzh168 approved these changes Aug 15, 2025

View reviewed changes

jerryzh168 reviewed Aug 15, 2025

View reviewed changes

jerryzh168 self-requested a review August 15, 2025 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make SmoothQuant more General #2728

Make SmoothQuant more General #2728

namgyu-youn commented Aug 11, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

namgyu-youn commented Aug 15, 2025

Uh oh!

jerryzh168 commented Aug 15, 2025 •

edited

Loading

Uh oh!

jerryzh168 Aug 15, 2025

Uh oh!

namgyu-youn Aug 16, 2025

Uh oh!

jerryzh168 Aug 15, 2025

Uh oh!

namgyu-youn Aug 16, 2025

Uh oh!

namgyu-youn commented Aug 15, 2025

Uh oh!

Uh oh!

	base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)]
	dummy_mod = DummyModule(observed_linear.weight * equalization_scale)
	quant_mod = base_config_handler(dummy_mod, config.base_config)

Make SmoothQuant more General #2728

Are you sure you want to change the base?

Make SmoothQuant more General #2728

Conversation

namgyu-youn commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2728

❌ 10 New Failures

Uh oh!

namgyu-youn commented Aug 15, 2025

Uh oh!

jerryzh168 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn commented Aug 15, 2025

Uh oh!

Uh oh!

namgyu-youn commented Aug 11, 2025 •

edited

Loading

pytorch-bot bot commented Aug 11, 2025 •

edited

Loading

jerryzh168 commented Aug 15, 2025 •

edited

Loading