Skip to content

Commit f387fbc

Browse files
authored
update vlm docs (#435)
1 parent e219a71 commit f387fbc

File tree

5 files changed

+206
-0
lines changed

5 files changed

+206
-0
lines changed
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
base:
2+
seed: &seed 42
3+
model:
4+
type: Llava OneVision
5+
path: model path
6+
torch_dtype: auto
7+
eval:
8+
eval_pos: [pretrain, transformed]
9+
type: vqa
10+
name: [mme]
11+
download: False
12+
path: MME dataset path
13+
bs: 1
14+
inference_per_block: False
15+
sparse:
16+
method: TokenReduction
17+
special:
18+
method: HoliTom
19+
RETAIN_RATIO: 0.20
20+
T: 0.65
21+
HOLITOM_k: 18
22+
HOLITOM_r: 0.5
23+
save:
24+
save_trans: False
25+
save_fake: False
26+
save_path: /path/to/save/
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
2+
3+
# Token Reduction
4+
5+
LightCompress currently supports token reduction for mainstream multimodal large language models. Configuration is very simple—plug and play.
6+
7+
Here is an example configuration
8+
9+
```yaml
10+
base:
11+
seed: &seed 42
12+
model:
13+
type: Llava
14+
path: model path
15+
torch_dtype: auto
16+
eval:
17+
eval_pos: [pretrain, transformed]
18+
type: vqa
19+
name: [gqa, mmbench_en_dev, mme]
20+
bs: 1
21+
inference_per_block: False
22+
sparse:
23+
method: TokenReduction
24+
special:
25+
method: FastV
26+
pruning_loc: 3
27+
rate: 0.778
28+
save:
29+
save_trans: False
30+
save_fake: False
31+
save_path: /path/to/save/
32+
```
33+
34+
The configuration file contains three core sections, including:
35+
36+
1. **`model`**
37+
For model selection, you can choose LLaVA, LLaVA-NeXT, Qwen2.5VL, and LLaVA OneVision, etc. These models cover both image and video tasks. For the detailed list of supported models, see the file. LightCompress will support more models in the future.
38+
39+
2. **`eval`**
40+
For the `eval_pos` parameter:
41+
- `pretrain` denotes the original model that keeps all visual tokens.
42+
- `transformed` denotes the model with token reduction applied.
43+
LightCompress integrates lmms-eval to evaluate various downstream datasets. Set `type` to `vqa`, and specify the datasets in `name` following the naming conventions in the lmms-eval documentation.
44+
45+
3. **`sparse`**
46+
Set `method` to `TokenReduction` first, and then specify the concrete algorithm and related hyperparameters under `special`. Since each algorithm has different hyperparameters, refer to the configuration files for details.
47+
48+
## Combining Quantization
49+
50+
LightCompress also supports an extreme compression scheme that combines token reduction with quantization. First, choose a quantization algorithm to save a `fake_qunat` model (see the quantization section of the docs). Then load this model and add the `token_reduction` field under `quant`.
51+
52+
```yaml
53+
quant:
54+
method: RTN
55+
weight:
56+
bit: 4
57+
symmetric: False
58+
granularity: per_group
59+
group_size: 128
60+
special:
61+
actorder: True
62+
static_groups: True
63+
percdamp: 0.01
64+
blocksize: 128
65+
true_sequential: True
66+
quant_out: True
67+
token_reduction:
68+
method: FastV
69+
special:
70+
pruning_loc: 3
71+
rate: 0.778
72+
```

docs/en/source/configs.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,6 +360,26 @@ quant:
360360
static: True
361361
```
362362

363+
## sparse
364+
365+
<font color=792ee5> sparse.method </font>
366+
367+
The name of the sparsification algorithm used. This includes both [model sparsification](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn) and [reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py) of visual tokens. All supported algorithms can be found in the corresponding files.
368+
369+
It’s worth noting that for model sparsification, you need to specify the exact algorithm name, whereas for token reduction, you only need to set it to `TokenReduction` first, and then specify the exact algorithm under `special`.
370+
371+
```yaml
372+
sparse:
373+
method: Wanda
374+
```
375+
376+
```yaml
377+
sparse:
378+
method: TokenReduction
379+
special:
380+
method: FastV
381+
```
382+
363383
## save
364384
365385
<font color=792ee5> save.save_vllm</font>
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Token Reduction
2+
3+
目前LightCompress支持对主流的多模态大语言模型进行token reduction,配置十分简单,即插即用。
4+
5+
下面是一个配置的例子
6+
7+
```yaml
8+
base:
9+
seed: &seed 42
10+
model:
11+
type: Llava
12+
path: model path
13+
torch_dtype: auto
14+
eval:
15+
eval_pos: [pretrain, transformed]
16+
type: vqa
17+
name: [gqa, mmbench_en_dev, mme]
18+
bs: 1
19+
inference_per_block: False
20+
sparse:
21+
method: TokenReduction
22+
special:
23+
method: FastV
24+
pruning_loc: 3
25+
rate: 0.778
26+
save:
27+
save_trans: False
28+
save_fake: False
29+
save_path: /path/to/save/
30+
```
31+
32+
配置文件中包含三大核心内容,包括:
33+
34+
1. `model`
35+
在模型选择上,可以选择LLaVA,LLaVA-NeXT,Qwen2.5VL以及LLaVA OneVision等,这些模型涵盖了图像任务和视频任务,详细的模型支持列表可以查阅[文件](https://github.com/ModelTC/LightCompress/blob/main/llmc/models/__init__.py),未来LightCompress也会支持更多的模型。
36+
37+
2. `eval`
38+
首先,在`eval_pos`参数的选择上,`pretrain`表示原始保留所有视觉token的模型,`transformed`表示应用相应算法进行token reduction的模型。LightCompress接入了[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)进行各种下游数据集测评,需要将`type`指定为`vqa`,`name`中的下游测评数据集参考lmms-eval[文档](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md)中的命名方式。
39+
40+
3. `sparse`
41+
`method`需要首先指定为TokenReduction,在`special`中继续指定具体的算法以及相关的一些超参数。由于每个算法对应的超参数不同,详细的可以参考[配置文件](https://github.com/ModelTC/LightCompress/tree/main/configs/sparsification/methods)。
42+
43+
44+
## 结合量化
45+
46+
LightCompress也支持同时使用token reduction和量化的极致压缩方案,首先需要选择量化算法存储一个`fake_qunat`模型,可以参考量化板块的文档。其次加载这个模型并在`quant`下加入`token_reduction`字段即可。
47+
48+
```yaml
49+
quant:
50+
method: RTN
51+
weight:
52+
bit: 4
53+
symmetric: False
54+
granularity: per_group
55+
group_size: 128
56+
special:
57+
actorder: True
58+
static_groups: True
59+
percdamp: 0.01
60+
blocksize: 128
61+
true_sequential: True
62+
quant_out: True
63+
token_reduction:
64+
method: FastV
65+
special:
66+
pruning_loc: 3
67+
rate: 0.778
68+
```

docs/zh_cn/source/configs.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,26 @@ quant:
401401
granularity: per_token
402402
```
403403

404+
## sparse
405+
406+
<font color=792ee5> sparse.method </font>
407+
408+
使用的稀疏化算法名,这包含对[模型的稀疏化](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/sparsification/__init__.pyn)和对视觉token的[reduction](https://github.com/ModelTC/LightCompress/blob/main/llmc/compression/token_reduction/__init__.py),所有支持算法可以在文件中查看。
409+
410+
值得说明的是针对模型稀疏化,需要指定具体的算法名称,而token reduction只需要先指定为`TokenReduction`,在`special`中继续指定具体的算法。
411+
412+
```yaml
413+
sparse:
414+
method: Wanda
415+
```
416+
417+
```yaml
418+
sparse:
419+
method: TokenReduction
420+
special:
421+
method: FastV
422+
```
423+
404424
## save
405425
406426
<font color=792ee5> save.save_vllm </font>

0 commit comments

Comments
 (0)