Skip to content

Support power of 2 scaling factors in float8 training and use e4m3 everywhere #7884

Support power of 2 scaling factors in float8 training and use e4m3 everywhere

Support power of 2 scaling factors in float8 training and use e4m3 everywhere #7884

test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch==2.7.0.dev20250122 --index-...  /  linux-job

succeeded Feb 6, 2025 in 47m 35s