Skip to content

Interns1支持Function Calling吗? #25

@Peilin-FF

Description

@Peilin-FF

感谢Interns1的杰出工作,我希望测试Interns1在MCP协议下调用工具作为Agent完成科学问题,从模型能力上讲,Interns1是对于科学问题理解能力最先进的模型。
不过在评测过程中我发现Interns1感知到工具的存在对工具进行调用,直接给出了答案。
一个评估的例子为:
"conversations": [
{
"role": "user",
"content": "\nYou are a geoscientist, and you need to use tools to answer multiple-choice questions about Earth observation data analysis. Question: The Chengdu Plain Agricultural Zone in Sichuan Province is a key rice-producing region in southwestern China. On July 12,, 2022, MODIS-derived Land Surface Temperature (LST) and Enhanced Vegetation Index (EVI) data were used to assess drought conditions across the area. First, list the input TIFF files, then calculate the Temperature-Vegetation Dryness Index (TVDI), and finally determine the percentage of the agricultural area where TVDI values exceeded the threshold of 0.75, indicating moderate water stress for rice crops.benchmark/data/question2\nA.12.87%\nB.22.40%\nC.28.07%\nD.36.56%"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"content": "\n\nB"
}
]
}
],
"final_answer": "\n\nB"
而别的模型比如GPT4o,Qwen3都可以完成工具的调用解决问题,如下:
"conversations": [
{
"role": "user",
"content": "\nYou are a geoscientist, and you need to use tools to answer multiple-choice questions about Earth observation data analysis. Note that if a tool returns an error, you can only try again once. Ultimately, you only need to explicitly tell me the correct choice.\nATTENTION:\n1. When a tool returns "Result saved at /path/to/file", you must use the full returned path "/path/to/file" in all subsequent tool calls.\n2. For each question, you must provide the choice you think is most appropriate.Don't gibe me another format. Your final answer format must be:\nYour choice\n\n\nQuestion: The Chengdu Plain Agricultural Zone in Sichuan Province is a key rice-producing region in southwestern China. On July 12,, 2022, MODIS-derived Land Surface Temperature (LST) and Enhanced Vegetation Index (EVI) data were used to assess drought conditions across the area. First, list the input TIFF files, then calculate the Temperature-Vegetation Dryness Index (TVDI), and finally determine the percentage of the agricultural area where TVDI values exceeded the threshold of 0.75, indicating moderate water stress for rice crops.benchmark/data/question2\nA.12.87%\nB.22.40%\nC.28.07%\nD.36.56%"
},
{
"role": "assistant",
"content": [
{
"name": "get_filelist",
"input": {
"dir_path": "benchmark/data/question2"
}
}
]
},
{
"role": "tool",
"name": "get_filelist",
"content": [
{
"output": [
{
"text": "["Sichuan_2021-07-12_LST.tif","Sichuan_2021-07-12_NDVI.tif"]"
}
]
}
]
},
{
"role": "assistant",
"content": [
{
"name": "compute_tvdi",
"input": {
"ndvi_path": "benchmark/data/question2/Sichuan_2021-07-12_NDVI.tif",
"lst_path": "benchmark/data/question2/Sichuan_2021-07-12_LST.tif",
"output_path": "benchmark/data/question2/tvdi_result.tif"
}
}
]
},
最终给出了正确的答案:
"final_answer": "C.28.07%"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions