Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

danielvegamyhre · 2025-01-22T19:35:29Z

Context

Currently, AQT has the method from_hp_to_floatx for float8 quantization, and from_hp_to_fpx for low precision floating point data types like fp6 (technically can support fp1-fp7).

from_hp_to_floatx re-uses from_hp_to_intx, which in turn uses these generic quantization primitives.

Overall, in the current state the float8 path is a bit confusing for developers, due to both the naming ("floatx") and the use of generic functions which include a bunch of params which are unrelated to float8 quantization.

Summary of changes

The goal of this PR stack is to refactor this to have a clean separation of concerns, and simpler internal API surfaces for code using in float8 quantization for inference.

Specifically:

Separate quantization primitives for float8
Integrate those new quant primitives into AQT
Integrate new AQT methods into float8 quantization APIs <------------------- (this PR)

Note: I will add float8 static quantization in a separate set of PRs.

[ghstack-poisoned]

danielvegamyhre · 2025-01-22T19:35:30Z

Stack from ghstack (oldest at bottom):

-> Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599
integrate new float8 quantization primitives into AQT #1598
add separate quantization primitives for float8 #1597

pytorch-bot · 2025-01-22T19:35:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1599

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 2f15cc1 with merge base 32d9b0b ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t 84ed1b6238c85976efd407f2dccfb3e367a35c6459cd1613a0be45cb65fb0ba0 /exec failed with exit code 1
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 10b1e1a3e3ec936f8a6e94f598ea3637e9c886a117fc6a5bb06ba4438414fdb1 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… quantization APIs ghstack-source-id: 293124b ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 059b697 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: d09ded5 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 33f1e89 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 3faa777 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

jerryzh168 · 2025-01-22T20:58:33Z

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 43890bf ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 26c1a6b ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre · 2025-01-22T21:10:59Z

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

jainapurva · 2025-01-23T00:50:55Z

torchao/dtypes/__init__.py

@@ -38,6 +34,7 @@
    "to_affine_quantized_fpx",
    "to_affine_quantized_floatx",


Please remove floatx, float8 should replace floatx.

Oh I left it in since it's still in use in other parts of the code base (autoquant, autoquant v2), and I wasn't sure if I should be touching those - is it ok to replace all instances across the whole codebase?

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 08fb7c8 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

… quantization APIs ghstack-source-id: cba5e1c ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 160d2df ghstack-comment-id: 2608105249 Pull Request resolved: #1599

jainapurva · 2025-01-23T01:30:27Z

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

Yes, we want all the instances replaced. Autoquant is using it for Float8. Hence would be better to rename it float8

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 15a37e9 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 3028fc5 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre · 2025-01-23T16:44:55Z

thanks, we also want to split out a Float8 (and floatx) specific AQT implementations as well, I talked to @jainapurva before

Yep that makes sense, when I talked to her earlier she said she is planning to create these AQT subclasses, so I decided to do this part of the refactor.

Yes, we want all the instances replaced. Autoquant is using it for Float8. Hence would be better to rename it float8

Done!

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 98647b4 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 451b9f7 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: f655d60 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 75b010d ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 8955284 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

vkuzo · 2025-01-23T17:53:50Z

torchao/dtypes/float8/float8_layout.py

@@ -209,19 +215,64 @@ def __repr__(self):
        )


+class Float8QuantizedTensor(AffineQuantizedTensor):


I'm not a fan of this, this introduces one more abstraction (Float8QuantizedTensor), while keeping the complexity of AffineQuantizedTensor. I think either staying with AQT or just writing a float8 tensor without using AQT would seem more attractive.

Interesting - cc @jainapurva @jerryzh168 thoughts on this?

For context AQT subclassing was part of a BE effort for the week, I'll share the doc with you internally

Removing AQT abstraction is easy, but the only reason I felt like keeping it was consistency in all dtypes. Though I do agree that it adds another level of abstraction

[ghstack-poisoned]

… quantization APIs ghstack-source-id: a331504 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 2cbe619 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

[ghstack-poisoned]

… quantization APIs ghstack-source-id: 61cc8c2 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre · 2025-01-23T20:55:53Z

Discussed offline, closing until internal discussions are finalized.

Update

a1becad

[ghstack-poisoned]

danielvegamyhre mentioned this pull request Jan 22, 2025

add separate quantization primitives for float8 #1597

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025

danielvegamyhre mentioned this pull request Jan 22, 2025

integrate new float8 quantization primitives into AQT #1598

Closed

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

1971e2d

… quantization APIs ghstack-source-id: 293124b ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre added quantize topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Jan 22, 2025

danielvegamyhre requested a review from jainapurva January 22, 2025 19:36

Update

4a1c1ea

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

1b35144

… quantization APIs ghstack-source-id: 059b697 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre removed the request for review from jainapurva January 22, 2025 20:16

Update

745529e

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

b90644a

… quantization APIs ghstack-source-id: d09ded5 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

33c958e

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

9808ffb

… quantization APIs ghstack-source-id: 33f1e89 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

f0fa1d9

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

5e16302

… quantization APIs ghstack-source-id: 3faa777 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

76ae4bc

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

d035740

… quantization APIs ghstack-source-id: 43890bf ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

0bc1899

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 22, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

1c2b661

… quantization APIs ghstack-source-id: 26c1a6b ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre requested review from jainapurva and jerryzh168 January 22, 2025 21:14

jainapurva reviewed Jan 23, 2025

View reviewed changes

Update

3255653

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

d6658b2

… quantization APIs ghstack-source-id: 08fb7c8 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

9606219

… quantization APIs ghstack-source-id: cba5e1c ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

fdbd828

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

62cffcd

… quantization APIs ghstack-source-id: 160d2df ghstack-comment-id: 2608105249 Pull Request resolved: #1599

This comment has been minimized.

Sign in to view

Update

c29f835

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

740a6d8

… quantization APIs ghstack-source-id: 15a37e9 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

5d23bda

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

96d6734

… quantization APIs ghstack-source-id: 3028fc5 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

d9d7bc6

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

0bcb483

… quantization APIs ghstack-source-id: 98647b4 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

288dff3

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

903d04d

… quantization APIs ghstack-source-id: 451b9f7 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

48b01e1

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

c9b493f

… quantization APIs ghstack-source-id: f655d60 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

488fc6f

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

e1627f3

… quantization APIs ghstack-source-id: 75b010d ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre changed the title ~~replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs~~ Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs Jan 23, 2025

Update

6d583af

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

b9580d3

… quantization APIs ghstack-source-id: 8955284 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

vkuzo reviewed Jan 23, 2025

View reviewed changes

Update

dc97eba

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

02fd781

… quantization APIs ghstack-source-id: a331504 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

0d8cdd9

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

dbfa940

… quantization APIs ghstack-source-id: 2cbe619 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

Update

2f15cc1

[ghstack-poisoned]

danielvegamyhre added a commit that referenced this pull request Jan 23, 2025

replace to_affine_quantized_floatx with to_affine_quantized_float8 in…

ae88acf

… quantization APIs ghstack-source-id: 61cc8c2 ghstack-comment-id: 2608105249 Pull Request resolved: #1599

danielvegamyhre closed this Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

danielvegamyhre commented Jan 22, 2025 •

edited

Loading

danielvegamyhre commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading

jerryzh168 commented Jan 22, 2025

danielvegamyhre commented Jan 22, 2025

jainapurva Jan 23, 2025

danielvegamyhre Jan 23, 2025 •

edited

Loading

jainapurva commented Jan 23, 2025

This comment has been minimized.

danielvegamyhre commented Jan 23, 2025

vkuzo Jan 23, 2025

danielvegamyhre Jan 23, 2025 •

edited

Loading

jainapurva Jan 23, 2025 •

edited

Loading

danielvegamyhre commented Jan 23, 2025

		@@ -38,6 +34,7 @@
		"to_affine_quantized_fpx",
		"to_affine_quantized_floatx",

		@@ -209,19 +215,64 @@ def __repr__(self):
		)


		class Float8QuantizedTensor(AffineQuantizedTensor):

Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

Add Float8QuantizedTensor (AQT subclass) and replace to_affine_quantized_floatx with to_affine_quantized_float8 in quantization APIs #1599

Conversation

danielvegamyhre commented Jan 22, 2025 • edited Loading

danielvegamyhre commented Jan 22, 2025 • edited Loading

pytorch-bot bot commented Jan 22, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1599

❌ 2 New Failures

jerryzh168 commented Jan 22, 2025

danielvegamyhre commented Jan 22, 2025

jainapurva Jan 23, 2025

Choose a reason for hiding this comment

danielvegamyhre Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

jainapurva commented Jan 23, 2025

This comment has been minimized.

danielvegamyhre commented Jan 23, 2025

vkuzo Jan 23, 2025

Choose a reason for hiding this comment

danielvegamyhre Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

jainapurva Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

danielvegamyhre commented Jan 23, 2025

danielvegamyhre commented Jan 22, 2025 •

edited

Loading

danielvegamyhre commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading

danielvegamyhre Jan 23, 2025 •

edited

Loading

danielvegamyhre Jan 23, 2025 •

edited

Loading

jainapurva Jan 23, 2025 •

edited

Loading