-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationgood first issueGood for newcomersGood for newcomers
Description
脚本如下,使用longvideotool/LongVT-RL预训练模型,评测videomme,发现几乎没有工具调用的过程
#!/bin/bash
# Environment variables
export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_MODEL_NAME="judge"
export OPENAI_BASE_URL="http://your-judge-server-ip:8000/v1"
export OPENAI_API_KEY="EMPTY"
export USE_LLM_JUDGE=False
export DECORD_EOF_RETRY_MAX=409600
TASK_NAME=videomme_reward_tool # Evaluation task name
IS_QWEN3_VL=True # Whether using Qwen3-VL model (True/False)
MAX_FRAME_NUM=${4:-768} # Number of frames (Default:768)
# Path to MCP server for tool calling
MCP_PATH="./examples/video_tools/mcp_server.py"
# Activate conda environment
source /opt/conda/etc/profile.d/conda.sh
conda activate eval
# Start vLLM server
# Qwen3 VL does not need additional chat template
if [ "$IS_QWEN3_VL" == "False" ]; then
vllm serve $CKPT_PATH \
--chat-template ./examples/eval/tool_call_qwen2_5_vl.jinja \
--tool-call-parser hermes \
--enable-auto-tool-choice \
--data-parallel-size 1 \
--gpu-memory-utilization 0.8 \
--trust-remote-code &
else
vllm serve $CKPT_PATH \
--tool-call-parser hermes \
--enable-auto-tool-choice \
--data-parallel-size 1 \
--gpu-memory-utilization 0.8 \
--trust-remote-code &
fi
sleep 240
# Run evaluation
accelerate launch --num_processes=8 --main_process_port 12345 -m lmms_eval \
--model async_openai \
--model_args model_version=$CKPT_PATH,mcp_server_path=$MCP_PATH,fps=1,max_frames=$MAX_FRAME_NUM,max_pixels=50176,base_url=$OPENAI_API_BASE,api_key=$OPENAI_API_KEY,num_cpus=1,timeout=12000,is_qwen3_vl=$IS_QWEN3_VL \
--tasks $TASK_NAME \
--batch_size 1 \
--output_path ./eval_logs \
--log_samples \
--include_path ./lmms_eval_tasks \
--limit 100
详细信息如下
20260204_234358_results.json
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationgood first issueGood for newcomersGood for newcomers