-
Notifications
You must be signed in to change notification settings - Fork 317
Make SmoothQuant more General #2728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: - Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow Test Plan: - Qwen 3-8B with example.py and unittest - Additional test plans requirerd ETC - Fix typo in README.md for SmoothQuant
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2728
Note: Links to docs will display an error until the docs builds have been completed. ❌ 10 New FailuresAs of commit ccb7b84 with merge base 2eae09b ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@jerryzh168 Could you please look into this PR? It was inspired by #2659 (comment) for more generalized SmoothQuant API. |
Thanks @namgyu-youn this is a step towards that but not fully general yet, it seems to be a quick change to add it though, commented inline. also it seems smoothquant is not very popular at the moment: https://huggingface.co/models?search=smoothquant, so I'd like to wait a bit before we invest more effort to it, let me know if you are interested to contribute more to torchao, we have many more higher priority issues that you can help with I think |
@@ -82,14 +55,15 @@ def forward(self, x): | |||
test_data = torch.randn(2, 32, dtype=input_dtype, device=device) | |||
|
|||
# Step 1: Setup quantized model with observer insertion and calibration | |||
insert_smooth_quant_observer_(m, alpha, quant_mode) | |||
config = SmoothQuantConfig(step="prepare", alpha=alpha, quant_mode=quant_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it you want to really make it general, the quant_mode
has to be changed to base_config, and we'll do a general quantization like this
ao/torchao/prototype/awq/api.py
Lines 104 to 106 in 2eae09b
base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)] | |
dummy_mod = DummyModule(observed_linear.weight * equalization_scale) | |
quant_mod = base_config_handler(dummy_mod, config.base_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, could you suggest which API can be here? int8_dynamic_activation_int4_weight
and int8_dynamic_activation_int8_weight
are in the options, and we can build more config tests.
weight = weight.to(observed_linear.weight.dtype) | ||
block_size = (1, weight.size(1)) | ||
wei_zero_points = torch.zeros_like(w_scales, dtype=torch.int64) | ||
wei_zero_points = torch.zeros_like(wei_scales, dtype=torch.int64) | ||
|
||
qw = to_affine_quantized_intx_static( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here this should be updated to
ao/torchao/prototype/awq/api.py
Lines 104 to 106 in 2eae09b
base_config_handler = _QUANTIZE_CONFIG_HANDLER[type(config.base_config)] | |
dummy_mod = DummyModule(observed_linear.weight * equalization_scale) | |
quant_mod = base_config_handler(dummy_mod, config.base_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed it. The Quantization API should be a choice, thanks!
Thanks for the kind info, and I truly love your team's work after reviewing TorchAO: CodeML @ ICML 2025. The recently updated contribution guide could be a great choice for the next contribution, but personally I prefer the sparsity (pruning) module more. Unfortunately, I heard the main POC (@jcaip) is on vacation, making it hard for me to progress. The following are my recent activities related to the sparsity module:
If there is no huge progress for the sparsity module, quantization (new APIs or primitive ops) might be a next step. Let me know if there is a good-second-issue about it. p.s. Could you please check #2644 ? It hasn't merged yet after being approved (no CI broken). Also, #2660 has been waiting for review (I am fine to close this because it is low-priority). |
Summary:
Add SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow
Test Plan:
Qwen 3-8B (static/dynamic) with
example.py
and unittest (test_smoothquant.py
)