LongVT-RL在videomme上tool_call频率非常低

脚本如下，使用longvideotool/LongVT-RL预训练模型，评测videomme，发现几乎没有工具调用的过程

```
#!/bin/bash

# Environment variables
export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_MODEL_NAME="judge"
export OPENAI_BASE_URL="http://your-judge-server-ip:8000/v1"
export OPENAI_API_KEY="EMPTY"
export USE_LLM_JUDGE=False
export DECORD_EOF_RETRY_MAX=409600

TASK_NAME=videomme_reward_tool                # Evaluation task name
IS_QWEN3_VL=True              # Whether using Qwen3-VL model (True/False)
MAX_FRAME_NUM=${4:-768}     # Number of frames (Default:768)

# Path to MCP server for tool calling
MCP_PATH="./examples/video_tools/mcp_server.py"

# Activate conda environment
source /opt/conda/etc/profile.d/conda.sh
conda activate eval

# Start vLLM server
# Qwen3 VL does not need additional chat template
if [ "$IS_QWEN3_VL" == "False" ]; then
    vllm serve $CKPT_PATH \
        --chat-template ./examples/eval/tool_call_qwen2_5_vl.jinja \
        --tool-call-parser hermes \
        --enable-auto-tool-choice \
        --data-parallel-size 1 \
        --gpu-memory-utilization 0.8 \
        --trust-remote-code &
else
    vllm serve $CKPT_PATH \
        --tool-call-parser hermes \
        --enable-auto-tool-choice \
        --data-parallel-size 1 \
        --gpu-memory-utilization 0.8 \
        --trust-remote-code &
fi
sleep 240

# Run evaluation
accelerate launch --num_processes=8 --main_process_port 12345 -m lmms_eval \
    --model async_openai \
    --model_args model_version=$CKPT_PATH,mcp_server_path=$MCP_PATH,fps=1,max_frames=$MAX_FRAME_NUM,max_pixels=50176,base_url=$OPENAI_API_BASE,api_key=$OPENAI_API_KEY,num_cpus=1,timeout=12000,is_qwen3_vl=$IS_QWEN3_VL \
    --tasks $TASK_NAME \
    --batch_size 1 \
    --output_path ./eval_logs \
    --log_samples \
    --include_path ./lmms_eval_tasks \
    --limit 100

```
详细信息如下
[20260204_234358_results.json](https://github.com/user-attachments/files/25085425/20260204_234358_results.json)

[20260204_234358_samples_videomme_reward_tool.json](https://github.com/user-attachments/files/25085370/20260204_234358_samples_videomme_reward_tool.json)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LongVT-RL在videomme上tool_call频率非常低 #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LongVT-RL在videomme上tool_call频率非常低 #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions