Skip to content

Commit 3e26028

Browse files
committed
Marking the flag as really not the fastest and BETA.
1 parent 7b38ca4 commit 3e26028

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

launcher/src/main.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,11 @@ enum Quantization {
4747
/// Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better
4848
/// perplexity performance for you model
4949
BitsandbytesFP4,
50+
/// [BETA]
5051
/// [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above
5152
/// This dtype has native ops should be the fastest if available.
53+
/// This is currently not the fastest because of local unpacking + padding to satisfy matrix
54+
/// multiplication limitations.
5255
Fp8,
5356
}
5457

0 commit comments

Comments
 (0)