-
Notifications
You must be signed in to change notification settings - Fork 63
update vlm docs #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update vlm docs #435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily focuses on enhancing the documentation for Visual Language Models (VLM) within the LightCompress project. It introduces new documentation for "Token Reduction" in both English and Chinese, provides a new configuration example for the HoliTom method, and updates the general configuration documentation to include details on sparse settings.
Highlights
- New Token Reduction Documentation: Comprehensive guides for configuring and using token reduction techniques for VLMs have been added in both English and Chinese.
- Updated Configuration Guide: The main configuration documentation now includes a dedicated section explaining the sparse parameter, detailing how to specify sparsification methods, including TokenReduction.
- New HoliTom Configuration Example: A new YAML configuration file for the HoliTom token reduction method has been added, providing a concrete example for users.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the documentation for Vision Language Models (VLMs), adding new pages for token reduction and updating existing configuration documentation in both English and Chinese. The changes are good, but I've found a few typos and missing hyperlinks in the documentation that should be addressed to improve clarity and user experience. I've provided suggestions for these minor fixes.
| 1. **`model`** | ||
| For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future. | ||
|
|
||
| 2. **`eval`** | ||
| For the `eval_pos` parameter: | ||
| - `pretrain` denotes the original model that keeps all visual tokens. | ||
| - `transformed` denotes the model with token reduction applied. | ||
| LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation. | ||
|
|
||
| 3. **`sparse`** | ||
| Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is missing a few helpful hyperlinks that are present in the Chinese version of the documentation. Adding them would improve the user experience by making it easier to navigate to related resources.
| 1. **`model`** | |
| For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future. | |
| 2. **`eval`** | |
| For the `eval_pos` parameter: | |
| - `pretrain` denotes the original model that keeps all visual tokens. | |
| - `transformed` denotes the model with token reduction applied. | |
| LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation. | |
| 3. **`sparse`** | |
| Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details. | |
| 1. **`model`** | |
| For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see [the file](https://github.com/ModelTC/LightCompress/blob/main/llmc/models/__init__.py). LightCompress will support more models in the future. | |
| 2. **`eval`** | |
| For the `eval_pos` parameter: | |
| - `pretrain` denotes the original model that keeps all visual tokens. | |
| - `transformed` denotes the model with token reduction applied. | |
| LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the [lmms-eval documentation](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md). | |
| 3. **`sparse`** | |
| Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the [configuration files](https://github.com/ModelTC/LightCompress/tree/main/configs/sparsification/methods) for details. |
|
|
||
| ## Combining Quantization | ||
|
|
||
| LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in fake_qunat. It should be fake_quant.
| LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`. | |
| LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_quant` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`. |
|
|
||
| <font color=792ee5> sparse.method </font> | ||
|
|
||
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the file extension in the link to sparsification/__init__.py. It should be .py, not .pyn.
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. | |
| The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files. |
|
|
||
| ## 结合量化 | ||
|
|
||
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in fake_qunat. It should be fake_quant.
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 | |
| LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_quant`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。 |
|
|
||
| <font color=792ee5> sparse.method </font> | ||
|
|
||
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the file extension in the link to sparsification/__init__.py. It should be .py, not .pyn.
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 | |
| 使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.py)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。 |
No description provided.