add a deprecation warning for float8 delayed and static scaling

vkuzo · vkuzo · commit 67bebe79d633 · 2025-02-06T15:39:12.000-08:00
Summary: As titled, the complexity tax for these features is high and there no known real use cases, as the community is overwhelmingly using dynamic scaling. So, IMO we should deprecate this. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2fc91db ghstack-comment-id: 2641358141 Pull Request resolved: #1681
diff --git a/torchao/float8/README.md b/torchao/float8/README.md
@@ -65,6 +65,8 @@ for _ in range(10):
 
 ## float8 linear with delayed scaling
 
+:warning: <em>We plan to deprecate delayed scaling in a future release, see https://github.com/pytorch/ao/issues/1680 for more details.</em>
+
 This is theoretically the most performant recipe as it minimizes memory reads.
 
 ```python
diff --git a/torchao/float8/config.py b/torchao/float8/config.py
@@ -304,6 +304,16 @@ def __post_init__(self):
                 "When using FSDP, it's recommended to enable config.force_recompute_fp8_weight_in_bwd."
             )
 
+        # Future deprecation warning for delayed scaling
+        if (
+            self.cast_config_input.scaling_type != ScalingType.DYNAMIC
+            or self.cast_config_weight.scaling_type != ScalingType.DYNAMIC
+            or self.cast_config_grad_output.scaling_type != ScalingType.DYNAMIC
+        ):
+            logger.warning(
+                "Note: delayed and static scaling will be deprecated in a future release of torchao. Please see https://github.com/pytorch/ao/issues/1680 for more details."
+            )
+
 
 # Pre-made recipes for common configurations
 # TODO(future PR): go through a round of design on this, and eventually expose