Skip to content

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Sep 26, 2025

Summary
We split the original big PR #2505 into the following smaller ones:

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 7980de8 with merge base 838dceb (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@Xia-Weiwen Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025
@Xia-Weiwen
Copy link
Collaborator Author

CC @mingfeima for review. Thanks.

@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

@Xia-Weiwen Xia-Weiwen marked this pull request as draft September 30, 2025 01:28
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35
@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

float8_dtype=torch.float8_e4m3fn,
block_size=block_size,
)
data = _quantize_affine_float8(hp_tensor, scale, torch.float8_e4m3fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to use

def _quantize_affine_float8_non_decomposed(
?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Since we are not using Inductor for fusion like PT2E, it should be OK here.

return processed_granularity


def _normalize_granularity_opaque_tensor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this can't reuse the other normalize_granularity_tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated


# Define FP8Granularity type alias to break circular import dependencies
FP8Granularity = Union["PerTensor", "PerRow"]
FP8GranularityCPU = Union["PerTensor", "PerRow", "PerGroup"]
Copy link
Contributor

@jerryzh168 jerryzh168 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can reuse and extend FP8Granularity and assert only part of the options are supported for GPU right now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

block_size = get_block_size(x.shape, activation_granularity)
else:
group_size = activation_granularity.group_size
block_size = (*([1] * (len(x.shape) - 1)), group_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not included in get_block_size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks.

_check_hardware_support(granularity)
is_cpu = weight.device.type == "cpu"
if not is_cpu:
_check_hardware_support(granularity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to version 1? and then version 2 can do this check in the tensor itself probably

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks.

Comment on lines 1848 to 1856
if not is_cpu and not _fp8_mm_compat(weight):
# TODO(future PR): this should really throw an exception instead of silently
# not doing what the user asked
return weight

if isinstance(weight_granularity, PerRow):
if not is_cpu and isinstance(weight_granularity, PerRow):
assert weight.dtype == torch.bfloat16, (
"PerRow quantization only works for bfloat16 precision input weight"
)
Copy link
Contributor

@jerryzh168 jerryzh168 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also these checks, I feel we can move these to version 1 branch for now and deprecate later, we can add the checks to tensors for version 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to version=1 branch causes CI failures. I will keep them as is. Maybe it can be improved later. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data type check is kept here and the _fp8_mm_compat check is moved to version=1. Thanks.

@Xia-Weiwen Xia-Weiwen requested a review from jerryzh168 October 14, 2025 01:59
@Xia-Weiwen
Copy link
Collaborator Author

@jerryzh168 Could you please review this PR again? Thanks.

]
],
supported_granularities: tuple[FP8Granularity] = (PerTensor, PerRow),
support_different_granularities: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird, I think we should have normalize_granularity to only do normalize, not also validation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same actually. Where should we put the validation? Thanks.

Copy link
Contributor

@jerryzh168 jerryzh168 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to _normalize_and_validate_granularities, can you define separate functions for both float8 tensor and float8 opque tensor in the tensor file itself? i.e. float8_tensor.py and float8_opque_tensor.py

probably will be clearer if you do this in a separate PR, that is move the original _normalize function to float8_tensor.py and change all the callsites first, and then in this PR you just need to add a new one for float8_opque_tensor.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will do. Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about version=1? Call _validate_granularity explicitly? In that case, _validate_granularity cannot be bound to a specific tensor type I guess. And _normalize_granularity (with checks) is called elsewhere too:

How shall we do validation at these locations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants