update vlm docs #435

chengtao-lv · 2025-08-13T09:31:15Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @chengtao-lv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the documentation for Visual Language Models (VLM) within the LightCompress project. It introduces new documentation for "Token Reduction" in both English and Chinese, provides a new configuration example for the HoliTom method, and updates the general configuration documentation to include details on sparse settings.

Highlights

New Token Reduction Documentation: Comprehensive guides for configuring and using token reduction techniques for VLMs have been added in both English and Chinese.
Updated Configuration Guide: The main configuration documentation now includes a dedicated section explaining the sparse parameter, detailing how to specify sparsification methods, including TokenReduction.
New HoliTom Configuration Example: A new YAML configuration file for the HoliTom token reduction method has been added, providing a concrete example for users.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the documentation for Vision Language Models (VLMs), adding new pages for token reduction and updating existing configuration documentation in both English and Chinese. The changes are good, but I've found a few typos and missing hyperlinks in the documentation that should be addressed to improve clarity and user experience. I've provided suggestions for these minor fixes.

gemini-code-assist · 2025-08-13T09:33:03Z

docs/en/source/advanced/token_reduction.md

+1. **`model`**  
+   For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future.
+
+2. **`eval`**  
+   For the `eval_pos` parameter:  
+   - `pretrain` denotes the original model that keeps all visual tokens.  
+   - `transformed` denotes the model with token reduction applied.  
+   LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation.
+
+3. **`sparse`**  
+   Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details.


This section is missing a few helpful hyperlinks that are present in the Chinese version of the documentation. Adding them would improve the user experience by making it easier to navigate to related resources.

Suggested change

1. **`model`**

For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future.

2. **`eval`**

For the `eval_pos` parameter:

- `pretrain` denotes the original model that keeps all visual tokens.

- `transformed` denotes the model with token reduction applied.

LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation.

3. **`sparse`**

Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details.

1. **`model`**

For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see [the file](https://github.com/ModelTC/LightCompress/blob/main/llmc/models/__init__.py). LightCompress will support more models in the future.

2. **`eval`**

For the `eval_pos` parameter:

- `pretrain` denotes the original model that keeps all visual tokens.

- `transformed` denotes the model with token reduction applied.

LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the [lmms-eval documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md).

3. **`sparse`**

Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the [configuration files](https://github.com/ModelTC/LightCompress/tree/main/configs/sparsification/methods) for details.

gemini-code-assist · 2025-08-13T09:33:03Z

docs/en/source/advanced/token_reduction.md

+
+## Combining Quantization
+
+LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.


There's a typo in fake_qunat. It should be fake_quant.

Suggested change

LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.

LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_quant` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.

gemini-code-assist · 2025-08-13T09:33:03Z

docs/en/source/configs.md

+
+<font color=792ee5> sparse.method </font>
+
+The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.


There's a typo in the file extension in the link to sparsification/__init__.py. It should be .py, not .pyn.

Suggested change

The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.

The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.

gemini-code-assist · 2025-08-13T09:33:03Z

docs/zh_cn/source/advanced/token_reduction.md

+
+## 结合量化
+
+LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。


There's a typo in fake_qunat. It should be fake_quant.

Suggested change

LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_quant`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。

gemini-code-assist · 2025-08-13T09:33:04Z

docs/zh_cn/source/configs.md

+
+<font color=792ee5> sparse.method </font>
+
+使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。


There's a typo in the file extension in the link to sparsification/__init__.py. It should be .py, not .pyn.

Suggested change

使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。

使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。

update vlm docs

b1a667d

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

helloyongyang approved these changes Aug 13, 2025

View reviewed changes

helloyongyang merged commit f387fbc into main Aug 13, 2025
2 checks passed

helloyongyang deleted the vlm branch August 13, 2025 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update vlm docs #435

update vlm docs #435

Uh oh!

chengtao-lv commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Combining Quantization

		LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.


		<font color=792ee5> sparse.method </font>

		The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.


		## 结合量化

		LightCompress也支持同时使用token reduction和量化的极致压缩方案，首先需要选择量化算法存储一个`fake_qunat`模型，可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。


		<font color=792ee5> sparse.method </font>

		使用的稀疏化算法名，这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py)，所有支持算法可以在文件中查看。

update vlm docs #435

update vlm docs #435

Uh oh!

Conversation

chengtao-lv commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants