Add support for power of 2 scaling factors in float8 training #1669

danielvegamyhre · 2025-02-05T19:01:58Z

Summary

Add support for power of 2 scaling factors in float8 training with dynamic scaling.

For row-wise scaling, power of 2 scaling is on by default, but can be optionally disabled.
For all other scaling granularities, power of 2 scaling is not used by default, and must be explicitly enabled by the caller.

REVIEWER NOTE: To support the behavior of "default on, but can be optionally disabled", we must be able to distinguish
between the values for the "unset default" and "explicit negative setting."
This is why I didn't use a simple boolean flag for power of 2 scaling, because the unset default and explicit negative setting would both be False. By using a dataclass config, we can distinguish between the unset default (None) and explicit negative setting (config exists and scaling_config.power_of_2_scale is False).

I'm open to feedback/suggestions if there is a better way to do this, let me know.

Test Plan

All float8 tests (test/float8/) are passing.
Since power of 2 scaling factor is on by default for row-wise scaling, all row-wise tests are now exercising this code path, and the quantization error etc is still within expected threshold (i.e., tests are passing).

pytorch-bot · 2025-02-05T19:02:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1669

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

✅ No Failures

As of commit ad8061b with merge base 8afd10e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-02-05T19:27:30Z

torchao/float8/config.py

@@ -234,6 +248,9 @@ class Float8LinearConfig:
    # tests so that the warning does not spam the CI stdout.
    force_recompute_fp8_weight_in_bwd: bool = False

+    # configuration used for calculating the scaling factor used in float8 quantization.
+    scaling_factor_config: Float8ScalingFactorConfig = None


just make it a boolean, and set it to true for rowwise scaling in code using Float8LinearRecipeName?

I originally did this (in a different branch), but then there didn't seem to be a nice/simple way for the user to use row-wise scaling without power of 2 scale factors.

I am trying to support the behavior of "Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable."

I tried to explain my logic in the PR description but basically if this is a boolean, we won't be able to distinguish if the user explicitly set it to False (for example, to use row-wise scaling without power of 2 scaling - in which case we should not override it to true in the rowwise code), or if it's False because it defaults to this when unset (in which case, we should set it to true in the rowwise code).

there didn't seem to be a nice/simple way for the user to use row-wise scaling without power of 2 scale factors

they could create a Float8LinearConfig and configure whatever they want?

Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable.

that should be changed to default on for row-wise scaling recipe created by the recipe enum to recipe function

Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable.

that should be changed to default on for row-wise scaling recipe created by the recipe enum to recipe function

Yeah, this is what I did in the other branch. I can create a PR for that one as well to compare. I didn't prefer it because a lot of the rowwise scaling tests do not use the recipe helper function, and just directly call hp_tensor_to_float8_dynamic so this approach would require modifying a bunch of tests, adding new parameterizations, etc. to make sure "default on" doesn't negatively affect numerics. So this approach seemed like a simpler way to me. Happy to do it the other way, though.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025

danielvegamyhre added the topic: new feature Use this tag if this PR adds a new feature label Feb 5, 2025

danielvegamyhre force-pushed the scaleconfig branch 2 times, most recently from 7bf2569 to ecef836 Compare February 5, 2025 19:06

danielvegamyhre marked this pull request as draft February 5, 2025 19:06

add support for power of 2 scaling in float8 training

ad8061b

danielvegamyhre force-pushed the scaleconfig branch from ecef836 to ad8061b Compare February 5, 2025 19:10

danielvegamyhre marked this pull request as ready for review February 5, 2025 19:23

danielvegamyhre requested a review from vkuzo February 5, 2025 19:23

vkuzo reviewed Feb 5, 2025

View reviewed changes

danielvegamyhre marked this pull request as draft February 5, 2025 20:02

danielvegamyhre closed this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for power of 2 scaling factors in float8 training #1669

Add support for power of 2 scaling factors in float8 training #1669

danielvegamyhre commented Feb 5, 2025 •

edited

Loading

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

vkuzo Feb 5, 2025

danielvegamyhre Feb 5, 2025 •

edited

Loading

vkuzo Feb 5, 2025

vkuzo Feb 5, 2025

danielvegamyhre Feb 5, 2025 •

edited

Loading

Add support for power of 2 scaling factors in float8 training #1669

Add support for power of 2 scaling factors in float8 training #1669

Conversation

danielvegamyhre commented Feb 5, 2025 • edited Loading

pytorch-bot bot commented Feb 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1669

❗ 1 Active SEVs

✅ No Failures

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

danielvegamyhre Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

danielvegamyhre Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

danielvegamyhre commented Feb 5, 2025 •

edited

Loading

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

danielvegamyhre Feb 5, 2025 •

edited

Loading

danielvegamyhre Feb 5, 2025 •

edited

Loading