update docs

chengtao-lv · chengtao-lv · commit 0a9d72b538b2 · 2025-08-13T19:42:34.000+08:00
diff --git a/README.md b/README.md
@@ -33,6 +33,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 ## :fire: Latest News
 
+- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).
+
 - **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
 
 - **Feb 07, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
diff --git a/README_zh.md b/README_zh.md
@@ -33,6 +33,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 ## :fire: 最新动态
 
+- **2025年8月13日:** 🚀 我们已开源针对 **视觉语言模型（VLMs）** 的压缩方案，支持共计超过 **20 种算法**，涵盖 **token reduction** 和 **quantization**。此次发布为多模态任务提供了灵活、即插即用的压缩策略。具体请参阅[文档](https://llmc-zhcn.readthedocs.io/en/latest/advanced/token_reduction.html)。
+
 - **2025年5月12日：** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化，并支持导出真实量化的 **INT8/FP8** 权重，兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)。
 
 - **2025年2月7日:** 🔥 我们现已全面支持 **`DeepSeekv3`**、**`DeepSeek-R1`** 和 **`DeepSeek-R1-zero`** 等 671B 大规模 **`MOE`** 模型的量化。 您可以直接加载 `FP8` 权重，无需额外转换，使用单张 80G 显存的 GPU 即可运行 `AWQ` 和 `RTN` 量化，同时还支持导出真实量化的 **INT4/INT8** 权重
diff --git a/docs/en/source/index.rst b/docs/en/source/index.rst
@@ -36,6 +36,7 @@ arxiv: https://arxiv.org/abs/2405.06001
    advanced/VLM_quant&img-txt_dataset.md
    advanced/mix_bits.md
    advanced/sparsification.md
+   advanced/token_reduction.md
 
 .. toctree::
    :maxdepth: 2
diff --git a/docs/zh_cn/source/advanced/VLM_quant&img-txt_dataset.md b/docs/zh_cn/source/advanced/VLM_quant&img-txt_dataset.md
@@ -1,8 +1,8 @@
-# VLM quant and custom_mm datatsets
+# VLM 量化和 custom_mm 数据集
 
 llmc目前支持对VLM模型使用图像-文本数据集进行校准并量化
 
-## VLM quant
+## VLM 量化
 当前支持的模型如下：
 1. llava
 
@@ -34,7 +34,7 @@ calib:
     padding: True
 ```
 
-## custom_mm datatsets
+## custom_mm 数据集
 custom_mm 数据集格式如下：
 ```
 custom_mm-datasets/
diff --git a/docs/zh_cn/source/advanced/Vit_quant&img_dataset.md b/docs/zh_cn/source/advanced/Vit_quant&img_dataset.md
@@ -1,8 +1,8 @@
-# Vit quant and img datatsets
+# Vit 量化和 img 数据集
 
 llmc目前支持对Vit模型使用图像数据集进行校准并量化
 
-## Vit quant
+## Vit 量化
 
 下面是一个配置的例子
 
@@ -33,7 +33,7 @@ eval:
     eval_token_consist: False
 ```
 
-## img datatsets
+## img 数据集
 img数据集格式要求：img数据集目录下存在图像
 
 img数据集格式示例:
diff --git a/docs/zh_cn/source/index.rst b/docs/zh_cn/source/index.rst
@@ -37,6 +37,7 @@ arxiv链接: https://arxiv.org/abs/2405.06001
    advanced/VLM_quant&img-txt_dataset.md
    advanced/mix_bits.md
    advanced/sparsification.md
+   advanced/token_reduction.md
 
 .. toctree::
    :maxdepth: 2