Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for power of 2 scaling factors in float8 training #1669

Closed
wants to merge 1 commit into from

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Feb 5, 2025

Summary

Add support for power of 2 scaling factors in float8 training with dynamic scaling.

  • For row-wise scaling, power of 2 scaling is on by default, but can be optionally disabled.
  • For all other scaling granularities, power of 2 scaling is not used by default, and must be explicitly enabled by the caller.

REVIEWER NOTE: To support the behavior of "default on, but can be optionally disabled", we must be able to distinguish
between the values for the "unset default" and "explicit negative setting."
This is why I didn't use a simple boolean flag for power of 2 scaling, because the unset default and explicit negative setting would both be False. By using a dataclass config, we can distinguish between the unset default (None) and explicit negative setting (config exists and scaling_config.power_of_2_scale is False).

I'm open to feedback/suggestions if there is a better way to do this, let me know.

Test Plan

  • All float8 tests (test/float8/) are passing.
  • Since power of 2 scaling factor is on by default for row-wise scaling, all row-wise tests are now exercising this code path, and the quantization error etc is still within expected threshold (i.e., tests are passing).

Copy link

pytorch-bot bot commented Feb 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1669

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit ad8061b with merge base 8afd10e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025
@danielvegamyhre danielvegamyhre added the topic: new feature Use this tag if this PR adds a new feature label Feb 5, 2025
@danielvegamyhre danielvegamyhre force-pushed the scaleconfig branch 2 times, most recently from 7bf2569 to ecef836 Compare February 5, 2025 19:06
@danielvegamyhre danielvegamyhre marked this pull request as draft February 5, 2025 19:06
@danielvegamyhre danielvegamyhre marked this pull request as ready for review February 5, 2025 19:23
@danielvegamyhre danielvegamyhre requested a review from vkuzo February 5, 2025 19:23
@@ -234,6 +248,9 @@ class Float8LinearConfig:
# tests so that the warning does not spam the CI stdout.
force_recompute_fp8_weight_in_bwd: bool = False

# configuration used for calculating the scaling factor used in float8 quantization.
scaling_factor_config: Float8ScalingFactorConfig = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make it a boolean, and set it to true for rowwise scaling in code using Float8LinearRecipeName?

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did this (in a different branch), but then there didn't seem to be a nice/simple way for the user to use row-wise scaling without power of 2 scale factors.

I am trying to support the behavior of "Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable."

I tried to explain my logic in the PR description but basically if this is a boolean, we won't be able to distinguish if the user explicitly set it to False (for example, to use row-wise scaling without power of 2 scaling - in which case we should not override it to true in the rowwise code), or if it's False because it defaults to this when unset (in which case, we should set it to true in the rowwise code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there didn't seem to be a nice/simple way for the user to use row-wise scaling without power of 2 scale factors

they could create a Float8LinearConfig and configure whatever they want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable.

that should be changed to default on for row-wise scaling recipe created by the recipe enum to recipe function

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default on for row-wise scaling, with option to disable. For other scaling granularities, default off, with option to enable.

that should be changed to default on for row-wise scaling recipe created by the recipe enum to recipe function

Yeah, this is what I did in the other branch. I can create a PR for that one as well to compare. I didn't prefer it because a lot of the rowwise scaling tests do not use the recipe helper function, and just directly call hp_tensor_to_float8_dynamic so this approach would require modifying a bunch of tests, adding new parameterizations, etc. to make sure "default on" doesn't negatively affect numerics. So this approach seemed like a simpler way to me. Happy to do it the other way, though.

@danielvegamyhre danielvegamyhre marked this pull request as draft February 5, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants