Skip to content

Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Jan 22, 2025

Context

Currently, AQT has the method from_hp_to_floatx for float8 quantization, and from_hp_to_fpx for low precision floating point data types like fp6 (technically can support fp1-fp7).

from_hp_to_floatx re-uses from_hp_to_intx, which in turn uses these generic quantization primitives.

Overall, in the current state the float8 path is a bit confusing for developers, due to both the naming ("floatx") and the use of generic functions which include a bunch of params which are unrelated to float8 quantization.

Summary of changes

The goal of this PR stack is to refactor this to have a clean separation of concerns, and simpler internal API surfaces for code using in float8 quantization for inference.

Specifically:

  • Separate quantization primitives for float8
  • Integrate those new quant primitives into AQT
  • Integrate new AQT methods into float8 quantization APIs <------------------- (this PR)

Note: I will add float8 static quantization in a separate set of PRs.

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jan 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1599

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 2f15cc1 with merge base 32d9b0b (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 293124b
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre danielvegamyhre added quantize topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Jan 22, 2025
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 059b697
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre danielvegamyhre removed the request for review from jainapurva January 22, 2025 20:16
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: d09ded5
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 33f1e89
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 3faa777
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@jerryzh168
Copy link
Contributor

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 43890bf
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 22, 2025
… quantization APIs

ghstack-source-id: 26c1a6b
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre
Copy link
Contributor Author

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

@@ -38,6 +34,7 @@
"to_affine_quantized_fpx",
"to_affine_quantized_floatx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove floatx, float8 should replace floatx.

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I left it in since it's still in use in other parts of the code base (autoquant, autoquant v2), and I wasn't sure if I should be touching those - is it ok to replace all instances across the whole codebase?

[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 08fb7c8
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: cba5e1c
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 160d2df
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@jainapurva
Copy link
Contributor

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

Yes, we want all the instances replaced. Autoquant is using it for Float8. Hence would be better to rename it float8

1 similar comment
@jainapurva

This comment has been minimized.

[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 15a37e9
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 3028fc5
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre
Copy link
Contributor Author

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

Yes, we want all the instances replaced. Autoquant is using it for Float8. Hence would be better to rename it float8

Done!

[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 98647b4
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 451b9f7
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: f655d60
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 75b010d
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre danielvegamyhre changed the title replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs Jan 23, 2025
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 8955284
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@@ -209,19 +215,64 @@ def __repr__(self):
)


class Float8QuantizedTensor(AffineQuantizedTensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this, this introduces one more abstraction (Float8QuantizedTensor), while keeping the complexity of AffineQuantizedTensor. I think either staying with AQT or just writing a float8 tensor without using AQT would seem more attractive.

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - cc @jainapurva @jerryzh168 thoughts on this?

For context AQT subclassing was part of a BE effort for the week, I'll share the doc with you internally

Copy link
Contributor

@jainapurva jainapurva Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing AQT abstraction is easy, but the only reason I felt like keeping it was consistency in all dtypes. Though I do agree that it adds another level of abstraction

[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: a331504
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 2cbe619
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
[ghstack-poisoned]
danielvegamyhre added a commit that referenced this pull request Jan 23, 2025
… quantization APIs

ghstack-source-id: 61cc8c2
ghstack-comment-id: 2608105249
Pull Request resolved: #1599
@danielvegamyhre
Copy link
Contributor Author

Discussed offline, closing until internal discussions are finalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. quantize topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants