RAGAS 评估结果为空, 输入txt的正确格式是什么？ #284

xiguahuoshan · 2025-01-15T02:52:37Z

问题描述 / Issue Description

RAGEval 评价，使用自己转成的txt作为输入，输出结果为 {}

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

generate_testset_task_cfg = {
    "debug": "true",
    "eval_backend": "RAGEval",
    "eval_config": {
        "tool": "RAGAS",
        "testset_generation": {
            "docs": ['./data/metrics.txt'],
            "test_size": 10,
            "output_file": "outputs/metrics.json",
            "knowledge_graph": "outputs/knowledge_graph.json",
            "generator_llm": {
                "model_name": "qwen2.5:14B-Instruct",
                "api_base": "http://localhost:11434/v1/",
                "api_key": "ollama",
            },
            "embeddings": {
                "model_name_or_path": "BAAI/bge-large-en-v1.5",
            },
            "language": "english"
        }
    },
}

%%time
import nltk
nltk.download('punkt_tab')
run_task(task_cfg=generate_testset_task_cfg)

其中，metrics.txt是直接用例子中 
[
    {
        "user_input": "第一届奥运会是什么时候举行的？",
        "retrieved_contexts": [
            "第一届现代奥运会于1896年4月6日到4月15日在希腊雅典举行。"
        ],
        "response": "第一届现代奥运会于1896年4月6日举行。",
        "reference": "第一届现代奥运会于1896年4月6日在希腊雅典开幕。"
    },
    {
        "user_input": "哪位运动员赢得了最多的奥运金牌？",
        "retrieved_contexts": [
            "迈克尔·菲尔普斯是历史上获得奥运金牌最多的运动员，他共赢得了23枚奥运金牌。"
        ],
        "response": "迈克尔·菲尔普斯赢得了最多的奥运金牌。",
        "reference": "迈克尔·菲尔普斯是获得奥运金牌最多的运动员，共赢得23枚金牌。"
    }
] 转成的txt

import json
# 从文件中读取 JSON 数据
with open("./data/test_data.json", "r") as file:
    data = json.load(file)
with open("./data/test_data.txt", "w") as file:
    json.dump(data, file, indent=4)  # 使用 json.dump 写入文件，格式化

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

[nltk_data] Downloading package punkt_tab to [/Users/wendy/nltk_data...](http://localhost:8889/Users/wendy/nltk_data...)
[nltk_data]   Package punkt_tab is already up-to-date!
2025-01-14 17:59:21,556 - evalscope - INFO - Args: Task config is provided with dictionary type.
2025-01-14 17:59:21,576 - evalscope - INFO - Dump task config to [./outputs/20250114_175921/configs/task_config_387b02.yaml](http://localhost:8889/outputs/20250114_175921/configs/task_config_387b02.yaml)
2025-01-14 17:59:21,588 - evalscope - INFO - {
    "model": null,
    "model_id": null,
    "model_args": {
        "revision": "master",
        "precision": "torch.float16",
        "device": "auto"
    },
    "template_type": null,
    "chat_template": null,
    "datasets": [],
    "dataset_args": {},
    "dataset_dir": "[/Users/wendy/.cache/modelscope/datasets](http://localhost:8889/Users/wendy/.cache/modelscope/datasets)",
    "dataset_hub": "modelscope",
    "generation_config": {
        "max_length": 2048,
        "max_new_tokens": 512,
        "do_sample": false,
        "top_k": 50,
        "top_p": 1.0,
        "temperature": 1.0
    },
    "eval_type": "checkpoint",
    "eval_backend": "RAGEval",
    "eval_config": {
        "tool": "RAGAS",
        "testset_generation": {
            "docs": [
                "[./data/test_data.txt](http://localhost:8889/data/test_data.txt)"
            ],
            "test_size": 2,
            "output_file": "outputs[/testset.json](http://localhost:8889/testset.json)",
            "knowledge_graph": "outputs[/knowledge_graph.json](http://localhost:8889/knowledge_graph.json)",
            "generator_llm": {
                "model_name": "qwen2.5:14B-Instruct",
                "api_base": "http://localhost:11434/v1/",
                "api_key": "ollama"
            },
            "embeddings": {
                "model_name_or_path": "AI-ModelScope/bge-large-zh"
            },
            "language": "chinese"
        }
    },
    "stage": "all",
    "limit": null,
    "mem_cache": false,
    "use_cache": null,
    "work_dir": "[./outputs/20250114_175921](http://localhost:8889/outputs/20250114_175921)",
    "outputs": null,
    "debug": false,
    "dry_run": false,
    "seed": 42,
    "api_url": null,
    "api_key": "EMPTY"
}
2025-01-14 17:59:21,593 - evalscope - INFO - Check `ragas` Installed
2025-01-14 17:59:21,597 - unstructured - WARNING - libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
2025-01-14 17:59:21,640 - evalscope - INFO - Loading model AI-ModelScope[/bge-large-zh](http://localhost:8889/bge-large-zh) from modelscope
Downloading Model to directory: /Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh
2025-01-14 17:59:22,623 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-01-14 17:59:23,084 - sentence_transformers.SentenceTransformer - INFO - Use pytorch device_name: mps
2025-01-14 17:59:23,086 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: [/Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh](http://localhost:8889/Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh)
2025-01-14 17:59:25,952 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,953 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/HeadlinesExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/HeadlinesExtractor)
2025-01-14 17:59:25,954 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,955 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SummaryExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SummaryExtractor)
2025-01-14 17:59:25,956 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,956 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/ThemesExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/ThemesExtractor)
2025-01-14 17:59:25,957 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,959 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/NERExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/NERExtractor)
2025-01-14 17:59:25,962 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,964 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter)
2025-01-14 17:59:25,965 - evalscope - INFO - Translate prompts finished
2025-01-14 17:59:25,966 - evalscope - INFO - Loading knowledge graph from outputs[/knowledge_graph.json](http://localhost:8889/knowledge_graph.json)
Generating personas: 100%
 2/2 [00:10<00:00,  4.56s/it]
2025-01-14 17:59:36,457 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:36,866 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:36,894 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,897 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,899 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SingleHopSpecificQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SingleHopSpecificQuerySynthesizer)
2025-01-14 17:59:36,909 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,914 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,918 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,919 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopAbstractQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopAbstractQuerySynthesizer)
2025-01-14 17:59:36,927 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,931 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,934 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopSpecificQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopSpecificQuerySynthesizer)
2025-01-14 17:59:36,936 - evalscope - INFO - Translate prompts finished
2025-01-14 17:59:36,939 - ragas.testset.synthesizers.multi_hop.abstract - INFO - found 0 clusters
2025-01-14 17:59:36,939 - ragas.testset.synthesizers.multi_hop.specific - INFO - found 0 clusters
Generating Scenarios: 100%
 1/1 [00:17<00:00, 17.33s/it]
2025-01-14 17:59:46,380 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:54,255 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Generating Samples: 
 0/0 [00:00<?, ?it/s]
Generating Answers: 0it [00:00, ?it[/s](http://localhost:8889/s)]
CPU times: user 994 ms, sys: 1.17 s, total: 2.16 s
Wall time: 33.5 s

{}

运行环境 / Runtime Environment

操作系统 / Operating System:
- Windows
- [yes ] macOS
- Ubuntu
Python版本 / Python Version:
- 3.11
- 3.10
- 3.9
- [ yes] 3.12

其他信息 / Additional Information

能否提供一个可供运行的txt文件示例，谢谢

如果有其他相关信息，请在此处提供。 / If there is any other relevant information, please provide it here.

The text was updated successfully, but these errors were encountered:

Yunnglin · 2025-01-15T03:40:27Z

输入的文档要长一些，不然生成不了问答对

Yunnglin · 2025-01-17T09:54:10Z

看你提供的代码，是想尝试生成问答对吗，还是只用来做RAG评测

Yunnglin added question Further information is requested rageval labels Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAGAS 评估结果为空, 输入txt的正确格式是什么？ #284

RAGAS 评估结果为空, 输入txt的正确格式是什么？ #284

xiguahuoshan commented Jan 15, 2025

Yunnglin commented Jan 15, 2025

Yunnglin commented Jan 17, 2025

RAGAS 评估结果为空, 输入txt的正确格式是什么？ #284

RAGAS 评估结果为空, 输入txt的正确格式是什么？ #284

Comments

xiguahuoshan commented Jan 15, 2025

问题描述 / Issue Description

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

错误日志 / Error Log

运行环境 / Runtime Environment

其他信息 / Additional Information

Yunnglin commented Jan 15, 2025

Yunnglin commented Jan 17, 2025