-
Notifications
You must be signed in to change notification settings - Fork 63
Vlm #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vlm #436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces and documents the new token reduction feature for Vision-Language Models (VLMs) within the LightCompress framework. It expands the compression capabilities to include multimodal tasks, offering flexible and plug-and-play strategies for reducing visual tokens and combining with quantization. The changes include new documentation, configuration examples, and updates to existing configuration guides to support these new VLM compression methods.
Highlights
- VLM Compression Solution: A new compression solution for Vision-Language Models (VLMs) has been open-sourced, supporting over 20 algorithms that cover both token reduction and quantization. This enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks.
- Token Reduction Documentation: Comprehensive documentation has been added in both English and Chinese, explaining how to configure and utilize token reduction for VLMs. This includes example configurations and details on model and evaluation parameters.
- HoliTom Configuration: A new YAML configuration file has been introduced for the HoliTom token reduction method, providing a specific example for VLM compression setup.
- Updated Configuration Guide: The general configuration documentation has been enhanced in both English and Chinese with a new
sparsesection, detailing how to specify sparsification and token reduction methods within the configuration files.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a compression solution for Vision Language Models (VLMs), adding the HoliTom token reduction method and corresponding documentation. The changes are extensive and well-structured. My review focuses on improving the documentation for clarity and correctness by fixing typos and broken links. In the code, I've pointed out areas for improvement regarding hyperparameter management, code style, and removal of debugging artifacts to enhance maintainability and reproducibility.
|
|
||
| <font color=792ee5> sparse.method </font> | ||
|
|
||
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. | |
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. |
|
|
||
| <font color=792ee5> sparse.method </font> | ||
|
|
||
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the __init__ file has a typo in the extension. It should be .py, not .pyn.
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 | |
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 |
|
|
||
| ## :fire: Latest News | ||
|
|
||
| - **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing "supporting over a total of" is a bit redundant. For better readability and grammar, I suggest simplifying it. Also, "please" at the beginning of a sentence should be capitalized.
| - **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over a total of **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html). | |
| - **August 13, 2025:** 🚀 We have open-sourced our compression solution for **vision-language models (VLMs)**, supporting over **20 algorithms** that cover both **token reduction** and **quantization**. This release enables flexible, plug-and-play compression strategies for a wide range of multimodal tasks. Please refer to the [documentation](https://llmc-en.readthedocs.io/en/latest/advanced/token_reduction.html). |
| @@ -1,8 +1,8 @@ | |||
| # VLM quant and custom_mm datatsets | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| padding: True | ||
| ``` | ||
| ## custom_mm datatsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -1,8 +1,8 @@ | |||
| # Vit quant and img datatsets | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| eval_token_consist: False | ||
| ``` | ||
| ## img datatsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RETAIN_RATIO: 0.20 | ||
| T: 0.65 | ||
| HOLITOM_k: 18 | ||
| HOLITOM_r: 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hyperparameter keys use a mix of uppercase and lowercase letters. For consistency with other configurations in the project (e.g., FastV uses lowercase keys like rate), it's recommended to use lowercase snake_case for these keys. This improves readability and maintainability.
retain_ratio: 0.20
t: 0.65
holitom_k: 18
holitom_r: 0.5
| For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future. | ||
|
|
||
| 2. **`eval`** | ||
| For the `eval_pos` parameter: | ||
| - `pretrain` denotes the original model that keeps all visual tokens. | ||
| - `transformed` denotes the model with token reduction applied. | ||
| LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation. | ||
|
|
||
| 3. **`sparse`** | ||
| Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details. | ||
|
|
||
| ## Combining Quantization | ||
|
|
||
| LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation file has a few areas for improvement:
- On line 37, "see the file" is vague. Please provide a direct markdown link to the file listing supported models.
- On line 46, "refer to the configuration files" should also be a link to the relevant directory for easier navigation.
- On line 50, there's a typo:
fake_qunatshould befake_quant.
|
|
||
| ## 结合量化 | ||
|
|
||
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in fake_qunat. It should be fake_quant.
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 | |
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_quant`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 |
No description provided.