pytorch · mikekgfb · May 10, 2024 · May 10, 2024
diff --git a/docs/quantization.md b/docs/quantization.md
@@ -1,9 +1,10 @@
 
 # Quantization
 
+<!--
 [shell default]: HF_TOKEN="${SECRET_HF_TOKEN_PERIODIC}" huggingface-cli login
-
 [shell default]: TORCHCHAT_ROOT=${PWD} ./scripts/install_et.sh
+-->
 
 ## Introduction
 Quantization focuses on reducing the precision of model parameters and computations from floating-point to lower-bit integers, such as 8-bit integers. This approach aims to minimize memory requirements, accelerate inference speeds, and decrease power consumption, making models more feasible for deployment on edge devices with limited computational resources. For high-performance devices such as GPUs, quantization provides a way to reduce the required memory bandwidth and take advantage of the massive compute capabilities provided by today's server-based accelerators such as GPUs.