Skip to content

Support power of 2 scaling factors in float8 training and use e4m3 everywhere #7943

Support power of 2 scaling factors in float8 training and use e4m3 everywhere

Support power of 2 scaling factors in float8 training and use e4m3 everywhere #7943

Annotations

2 errors

test (CUDA 2.3, linux.g5.12xlarge.nvidia.gpu, torch==2.3.0, cuda, 12.1)  /  linux-job

cancelled Feb 7, 2025 in 8m 44s