[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Xia-Weiwen · 2025-09-26T03:18:08Z

Summary
We split the original big PR #2505 into the following smaller ones:

Unify get_block_size #3039 (relanded by [Reland] Unify get_block_size #3059)
[CPU] Add ops for float8 linear #3052
And this PR [CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075, which as the Float8OpaqueTensor for dynamic float8 act float8 weight quantization on CPU

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

pytorch-bot · 2025-09-26T03:18:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm][CI] Machines under the label linux.rocm.gpu.2 are undergoing maintenance.

✅ No Failures

As of commit 7980de8 with merge base 838dceb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-09-26T03:19:01Z

CC @mingfeima for review. Thanks.

Xia-Weiwen · 2025-09-28T01:16:59Z

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

Xia-Weiwen · 2025-09-30T01:38:23Z

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

jerryzh168 · 2025-10-06T20:51:34Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+            float8_dtype=torch.float8_e4m3fn,
+            block_size=block_size,
+        )
+        data = _quantize_affine_float8(hp_tensor, scale, torch.float8_e4m3fn)


do you need to use

ao/torchao/quantization/quant_primitives.py

Line 2425 in c96f2dd

def _quantize_affine_float8_non_decomposed(

?

Thanks. Since we are not using Inductor for fusion like PT2E, it should be OK here.

jerryzh168 · 2025-10-06T20:52:19Z

torchao/float8/inference.py

    return processed_granularity


+def _normalize_granularity_opaque_tensor(


why this can't reuse the other normalize_granularity_tensor?

Thanks. Updated

jerryzh168 · 2025-10-06T20:53:10Z

torchao/float8/types.py


 # Define FP8Granularity type alias to break circular import dependencies
 FP8Granularity = Union["PerTensor", "PerRow"]
+FP8GranularityCPU = Union["PerTensor", "PerRow", "PerGroup"]


I feel we can reuse and extend FP8Granularity and assert only part of the options are supported for GPU right now

Thanks. Updated.

jerryzh168 · 2025-10-06T20:53:57Z

torchao/quantization/quant_api.py

+        block_size = get_block_size(x.shape, activation_granularity)
+    else:
+        group_size = activation_granularity.group_size
+        block_size = (*([1] * (len(x.shape) - 1)), group_size)


why is this not included in get_block_size?

Updated. Thanks.

jerryzh168 · 2025-10-06T20:55:17Z

torchao/quantization/quant_api.py

-    _check_hardware_support(granularity)
+    is_cpu = weight.device.type == "cpu"
+    if not is_cpu:
+        _check_hardware_support(granularity)


can you move this to version 1? and then version 2 can do this check in the tensor itself probably

Sure. Thanks.

jerryzh168 · 2025-10-06T20:55:47Z

torchao/quantization/quant_api.py

+    if not is_cpu and not _fp8_mm_compat(weight):
        # TODO(future PR): this should really throw an exception instead of silently
        # not doing what the user asked
        return weight

-    if isinstance(weight_granularity, PerRow):
+    if not is_cpu and isinstance(weight_granularity, PerRow):
        assert weight.dtype == torch.bfloat16, (
            "PerRow quantization only works for bfloat16 precision input weight"
        )


also these checks, I feel we can move these to version 1 branch for now and deprecate later, we can add the checks to tensors for version 2

Sure. thanks.

Moving this to version=1 branch causes CI failures. I will keep them as is. Maybe it can be improved later. Thanks.

The data type check is kept here and the _fp8_mm_compat check is moved to version=1. Thanks.

Xia-Weiwen · 2025-10-14T02:04:04Z

@jerryzh168 Could you please review this PR again? Thanks.

jerryzh168 · 2025-10-14T02:54:42Z

torchao/float8/inference.py

        ]
    ],
+    supported_granularities: tuple[FP8Granularity] = (PerTensor, PerRow),
+    support_different_granularities: bool = False,


this is weird, I think we should have normalize_granularity to only do normalize, not also validation

I feel the same actually. Where should we put the validation? Thanks.

this seems to _normalize_and_validate_granularities, can you define separate functions for both float8 tensor and float8 opque tensor in the tensor file itself? i.e. float8_tensor.py and float8_opque_tensor.py

probably will be clearer if you do this in a separate PR, that is move the original _normalize function to float8_tensor.py and change all the callsites first, and then in this PR you just need to add a new one for float8_opque_tensor.py

Sounds good. Will do. Thanks

How about version=1? Call _validate_granularity explicitly? In that case, _validate_granularity cannot be bound to a specific tensor type I guess. And _normalize_granularity (with checks) is called elsewhere too:

ao/torchao/quantization/quant_api.py

Line 1945 in 838dceb

activation_granularity, weight_granularity = _normalize_granularity(granularity)

ao/torchao/quantization/qat/fake_quantize_config.py

Line 422 in 838dceb

(act_granularity, weight_granularity) = _normalize_granularity(

How shall we do validation at these locations?

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025

Xia-Weiwen requested review from andrewor14 and jerryzh168 September 26, 2025 06:10

Xia-Weiwen marked this pull request as ready for review September 26, 2025 06:10

Xia-Weiwen mentioned this pull request Sep 26, 2025

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Closed

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight

d460134

mingfeima reviewed Sep 28, 2025

View reviewed changes

Xia-Weiwen requested a review from mingfeima September 28, 2025 02:08

Xia-Weiwen marked this pull request as draft September 30, 2025 01:28

Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

Xia-Weiwen added 3 commits October 9, 2025 10:00

Update _normalize_granularity

cf8dc09

Update torchao/quantization/quant_api.py

4333727

Fix CI

6e1c2a2

Xia-Weiwen requested a review from jerryzh168 October 14, 2025 01:59

jerryzh168 reviewed Oct 14, 2025

View reviewed changes

Merge branch 'main' into float8_opaque_tensor_new

7980de8

		return processed_granularity


		def _normalize_granularity_opaque_tensor(

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Are you sure you want to change the base?

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Uh oh!

Conversation

Xia-Weiwen commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Xia-Weiwen commented Sep 26, 2025

Uh oh!

Xia-Weiwen commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xia-Weiwen commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

jerryzh168 Oct 6, 2025 •

edited

Loading

jerryzh168 Oct 6, 2025 •

edited

Loading

jerryzh168 Oct 14, 2025 •

edited

Loading