Vlm #436

chengtao-lv · 2025-08-13T11:39:40Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @chengtao-lv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces and documents the new token reduction feature for Vision-Language Models (VLMs) within the LightCompress framework. It expands the compression capabilities to include multimodal tasks, offering flexible and plug-and-play strategies for reducing visual tokens and combining with quantization. The changes include new documentation, configuration examples, and updates to existing configuration guides to support these new VLM compression methods.

Highlights

VLM Compression Solution: A new compression solution for Vision-Language Models (VLMs) has been open-sourced, supporting over 20 algorithms that cover both token reduction and quantization. This enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks.
Token Reduction Documentation: Comprehensive documentation has been added in both English and Chinese, explaining how to configure and utilize token reduction for VLMs. This includes example configurations and details on model and evaluation parameters.
HoliTom Configuration: A new YAML configuration file has been introduced for the HoliTom token reduction method, providing a specific example for VLM compression setup.
Updated Configuration Guide: The general configuration documentation has been enhanced in both English and Chinese with a new sparse section, detailing how to specify sparsification and token reduction methods within the configuration files.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a compression solution for Vision Language Models (VLMs), adding the HoliTom token reduction method and corresponding documentation. The changes are extensive and well-structured. My review focuses on improving the documentation for clarity and correctness by fixing typos and broken links. In the code, I've pointed out areas for improvement regarding hyperparameter management, code style, and removal of debugging artifacts to enhance maintainability and reproducibility.

gemini-code-assist · 2025-08-13T11:42:03Z

docs/en/source/configs.md

+
+<font color=792ee5> sparse.method </font>
+
+The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.


The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.

Suggested change

The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.

The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.

gemini-code-assist · 2025-08-13T11:42:03Z

docs/zh_cn/source/configs.md

+
+<font color=792ee5> sparse.method </font>
+
+使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。


The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.

Suggested change

使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。

使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。

gemini-code-assist · 2025-08-13T11:42:03Z

README.md


 ## :fire: Latest News

+- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).


The phrasing "supporting over a total of" is a bit redundant. For better readability and grammar, I suggest simplifying it. Also, "please" at the beginning of a sentence should be capitalized.

Suggested change

- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).

- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. Please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).

gemini-code-assist · 2025-08-13T11:42:04Z

docs/zh_cn/source/advanced/VLM_quant&img-txt_dataset.md

@@ -1,8 +1,8 @@
-# VLM quant and custom_mm datatsets


There's a typo in the original heading: datatsets should be datasets.

gemini-code-assist · 2025-08-13T11:42:04Z

docs/zh_cn/source/advanced/VLM_quant&img-txt_dataset.md

    padding: True
 ```

-## custom_mm datatsets


There's a typo in the original heading: datatsets should be datasets.

gemini-code-assist · 2025-08-13T11:42:04Z

docs/zh_cn/source/advanced/Vit_quant&img_dataset.md

@@ -1,8 +1,8 @@
-# Vit quant and img datatsets


There's a typo in the original heading: datatsets should be datasets.

gemini-code-assist · 2025-08-13T11:42:04Z

docs/zh_cn/source/advanced/Vit_quant&img_dataset.md

    eval_token_consist: False
 ```

-## img datatsets


There's a typo in the original heading: datatsets should be datasets.

gemini-code-assist · 2025-08-13T11:42:04Z

configs/sparsification/methods/Holitom/holitom.yml

+        RETAIN_RATIO: 0.20
+        T: 0.65
+        HOLITOM_k: 18
+        HOLITOM_r: 0.5


The hyperparameter keys use a mix of uppercase and lowercase letters. For consistency with other configurations in the project (e.g., FastV uses lowercase keys like rate), it's recommended to use lowercase snake_case for these keys. This improves readability and maintainability.

retain_ratio: 0.20 t: 0.65 holitom_k: 18 holitom_r: 0.5

gemini-code-assist · 2025-08-13T11:42:04Z

docs/en/source/advanced/token_reduction.md

+   For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future.
+
+2. **`eval`**  
+   For the `eval_pos` parameter:  
+   - `pretrain` denotes the original model that keeps all visual tokens.  
+   - `transformed` denotes the model with token reduction applied.  
+   LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation.
+
+3. **`sparse`**  
+   Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details.
+
+## Combining Quantization
+
+LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.


This documentation file has a few areas for improvement:

On line 37, "see the file" is vague. Please provide a direct markdown link to the file listing supported models.

On line 46, "refer to the configuration files" should also be a link to the relevant directory for easier navigation.

On line 50, there's a typo: fake_qunat should be fake_quant.

gemini-code-assist · 2025-08-13T11:42:04Z

docs/zh_cn/source/advanced/token_reduction.md

+
+## 结合量化
+
+LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。


There is a typo in fake_qunat. It should be fake_quant.

Suggested change

LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_quant`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

update vlm docs

b1a667d

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

update docs

0a9d72b

helloyongyang approved these changes Aug 13, 2025

View reviewed changes

helloyongyang merged commit 2064b1f into main Aug 13, 2025
2 checks passed

helloyongyang deleted the vlm branch August 13, 2025 11:43


		<font color=792ee5> sparse.method </font>

		The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.


		<font color=792ee5> sparse.method </font>

		使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。


		## :fire: Latest News

		- August 13, 2025: 🚀 We have open-sourced our compression solution for vision-language models (VLMs), supporting over a total of 20 algorithms that cover both token reduction and quantization. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).


		## 结合量化

		LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

Vlm #436

Vlm #436

Uh oh!

Conversation

chengtao-lv commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants