Skip to content

Make SmoothQuant more General #2728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

namgyu-youn
Copy link
Contributor

@namgyu-youn namgyu-youn commented Aug 11, 2025

Summary:
Add SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow

Test Plan:
Qwen 3-8B (static/dynamic) with example.py and unittest (test_smoothquant.py)

Summary:
- Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow

Test Plan:
- Qwen 3-8B with example.py and unittest
- Additional test plans requirerd

ETC
- Fix typo in README.md for SmoothQuant
Copy link

pytorch-bot bot commented Aug 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2728

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit ccb7b84 with merge base 2eae09b (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2025
@namgyu-youn namgyu-youn marked this pull request as draft August 12, 2025 06:53
@namgyu-youn namgyu-youn marked this pull request as ready for review August 12, 2025 08:01
@namgyu-youn
Copy link
Contributor Author

@jerryzh168 Could you please look into this PR? It was inspired by #2659 (comment) for more generalized SmoothQuant API.

@jerryzh168
Copy link
Contributor

jerryzh168 commented Aug 15, 2025

Thanks @namgyu-youn this is a step towards that but not fully general yet, it seems to be a quick change to add it though, commented inline.

also it seems smoothquant is not very popular at the moment: https://huggingface.co/models?search=smoothquant, so I'd like to wait a bit before we invest more effort to it, let me know if you are interested to contribute more to torchao, we have many more higher priority issues that you can help with I think

@@ -82,14 +55,15 @@ def forward(self, x):
test_data = torch.randn(2, 32, dtype=input_dtype, device=device)

# Step 1: Setup quantized model with observer insertion and calibration
insert_smooth_quant_observer_(m, alpha, quant_mode)
config = SmoothQuantConfig(step="prepare", alpha=alpha, quant_mode=quant_mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it you want to really make it general, the quant_mode has to be changed to base_config, and we'll do a general quantization like this

base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)]
dummy_mod = DummyModule(observed_linear.weight * equalization_scale)
quant_mod = base_config_handler(dummy_mod, config.base_config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, could you suggest which API can be here? int8_dynamic_activation_int4_weight and int8_dynamic_activation_int8_weight are in the options, and we can build more config tests.

weight = weight.to(observed_linear.weight.dtype)
block_size = (1, weight.size(1))
wei_zero_points = torch.zeros_like(w_scales, dtype=torch.int64)
wei_zero_points = torch.zeros_like(wei_scales, dtype=torch.int64)

qw = to_affine_quantized_intx_static(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here this should be updated to

base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)]
dummy_mod = DummyModule(observed_linear.weight * equalization_scale)
quant_mod = base_config_handler(dummy_mod, config.base_config)
to make it truly general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed it. The Quantization API should be a choice, thanks!

@jerryzh168 jerryzh168 self-requested a review August 15, 2025 17:53
@namgyu-youn
Copy link
Contributor Author

Thanks @namgyu-youn this is a step towards that but not fully general yet, it seems to be a quick change to add it though, commented inline.

also it seems smoothquant is not very popular at the moment: https://huggingface.co/models?search=smoothquant, so I'd like to wait a bit before we invest more effort to it, let me know if you are interested to contribute more to torchao, we have many more higher priority issues that you can help with I think

Thanks for the kind info, and I truly love your team's work after reviewing TorchAO: CodeML @ ICML 2025.

The recently updated contribution guide could be a great choice for the next contribution, but personally I prefer the sparsity (pruning) module more. Unfortunately, I heard the main POC (@jcaip) is on vacation, making it hard for me to progress. The following are my recent activities related to the sparsity module:

  1. Since Wanda was already introduced, I recently introduced Wanda++ at feat: RGS for wanda++ #2537.
  2. Computation overhead was missing in your team's workshop (not certain because of my lack of knowledge), and opened issue at Missing benchmark for sparse24_sm90_sparsify overhead #2612
  3. Also interested in Activation compression Accelerate activation sparsity with activation compression #1920, but I have to learn more about it.

If there is no huge progress for the sparsity module, quantization (new APIs or primitive ops) might be a next step. Let me know if there is a good-second-issue about it.

p.s. Could you please check #2644 ? It hasn't merged yet after being approved (no CI broken). Also, #2660 has been waiting for review (I am fine to close this because it is low-priority).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants