Skip to content

Conversation

@chengtao-lv
Copy link
Collaborator

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces and documents the new token reduction feature for Vision-Language Models (VLMs) within the LightCompress framework. It expands the compression capabilities to include multimodal tasks, offering flexible and plug-and-play strategies for reducing visual tokens and combining with quantization. The changes include new documentation, configuration examples, and updates to existing configuration guides to support these new VLM compression methods.

Highlights

  • VLM Compression Solution: A new compression solution for Vision-Language Models (VLMs) has been open-sourced, supporting over 20 algorithms that cover both token reduction and quantization. This enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks.
  • Token Reduction Documentation: Comprehensive documentation has been added in both English and Chinese, explaining how to configure and utilize token reduction for VLMs. This includes example configurations and details on model and evaluation parameters.
  • HoliTom Configuration: A new YAML configuration file has been introduced for the HoliTom token reduction method, providing a specific example for VLM compression setup.
  • Updated Configuration Guide: The general configuration documentation has been enhanced in both English and Chinese with a new sparse section, detailing how to specify sparsification and token reduction methods within the configuration files.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a compression solution for Vision Language Models (VLMs), adding the HoliTom token reduction method and corresponding documentation. The changes are extensive and well-structured. My review focuses on improving the documentation for clarity and correctness by fixing typos and broken links. In the code, I've pointed out areas for improvement regarding hyperparameter management, code style, and removal of debugging artifacts to enhance maintainability and reproducibility.


<font color=792ee5> sparse.method </font>

The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.

Suggested change
The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.
The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.


<font color=792ee5> sparse.method </font>

使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.

Suggested change
使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。
使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。


## :fire: Latest News

- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrasing "supporting over a total of" is a bit redundant. For better readability and grammar, I suggest simplifying it. Also, "please" at the beginning of a sentence should be capitalized.

Suggested change
- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).
- **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. Please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html).

@@ -1,8 +1,8 @@
# VLM quant and custom_mm datatsets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the original heading: datatsets should be datasets.

padding: True
```
## custom_mm datatsets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the original heading: datatsets should be datasets.

@@ -1,8 +1,8 @@
# Vit quant and img datatsets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the original heading: datatsets should be datasets.

eval_token_consist: False
```
## img datatsets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the original heading: datatsets should be datasets.

Comment on lines +19 to +22
RETAIN_RATIO: 0.20
T: 0.65
HOLITOM_k: 18
HOLITOM_r: 0.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hyperparameter keys use a mix of uppercase and lowercase letters. For consistency with other configurations in the project (e.g., FastV uses lowercase keys like rate), it's recommended to use lowercase snake_case for these keys. This improves readability and maintainability.

        retain_ratio: 0.20
        t: 0.65
        holitom_k: 18
        holitom_r: 0.5

Comment on lines +37 to +50
For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future.

2. **`eval`**
For the `eval_pos` parameter:
- `pretrain` denotes the original model that keeps all visual tokens.
- `transformed` denotes the model with token reduction applied.
LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation.

3. **`sparse`**
Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details.

## Combining Quantization

LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This documentation file has a few areas for improvement:

  1. On line 37, "see the file" is vague. Please provide a direct markdown link to the file listing supported models.
  2. On line 46, "refer to the configuration files" should also be a link to the relevant directory for easier navigation.
  3. On line 50, there's a typo: fake_qunat should be fake_quant.


## 结合量化

LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in fake_qunat. It should be fake_quant.

Suggested change
LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。
LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_quant`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

@helloyongyang helloyongyang merged commit 2064b1f into main Aug 13, 2025
2 checks passed
@helloyongyang helloyongyang deleted the vlm branch August 13, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants