Update quantization.md (#735)

Use comments as suggested by GitHub docs
pytorch · Jul 17, 2024 · d2a8da7 · d2a8da7
1 parent 5d83b0f
commit d2a8da7
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/docs/quantization.md b/docs/quantization.md
@@ -1,9 +1,10 @@
 
 # Quantization
 
+<!--
 [shell default]: HF_TOKEN="${SECRET_HF_TOKEN_PERIODIC}" huggingface-cli login
-
 [shell default]: TORCHCHAT_ROOT=${PWD} ./scripts/install_et.sh
+-->
 
 ## Introduction
 Quantization focuses on reducing the precision of model parameters and computations from floating-point to lower-bit integers, such as 8-bit integers. This approach aims to minimize memory requirements, accelerate inference speeds, and decrease power consumption, making models more feasible for deployment on edge devices with limited computational resources. For high-performance devices such as GPUs, quantization provides a way to reduce the required memory bandwidth and take advantage of the massive compute capabilities provided by today's server-based accelerators such as GPUs.