Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAGAS 评估结果为空, 输入txt的正确格式是什么? #284

Open
10 tasks
xiguahuoshan opened this issue Jan 15, 2025 · 2 comments
Open
10 tasks

RAGAS 评估结果为空, 输入txt的正确格式是什么? #284

xiguahuoshan opened this issue Jan 15, 2025 · 2 comments
Labels
question Further information is requested rageval

Comments

@xiguahuoshan
Copy link

问题描述 / Issue Description

RAGEval 评价,使用自己转成的txt作为输入,输出结果为 {}

使用的工具 / Tools Used

  • Native / 原生框架
  • Opencompass backend
  • VLMEvalKit backend
  • [ yes] RAGEval backend
  • Perf / 模型推理压测工具
  • Arena /竞技场模式

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

generate_testset_task_cfg = {
    "debug": "true",
    "eval_backend": "RAGEval",
    "eval_config": {
        "tool": "RAGAS",
        "testset_generation": {
            "docs": ['./data/metrics.txt'],
            "test_size": 10,
            "output_file": "outputs/metrics.json",
            "knowledge_graph": "outputs/knowledge_graph.json",
            "generator_llm": {
                "model_name": "qwen2.5:14B-Instruct",
                "api_base": "http://localhost:11434/v1/",
                "api_key": "ollama",
            },
            "embeddings": {
                "model_name_or_path": "BAAI/bge-large-en-v1.5",
            },
            "language": "english"
        }
    },
}

%%time
import nltk
nltk.download('punkt_tab')
run_task(task_cfg=generate_testset_task_cfg)

其中metrics.txt是直接用例子中 
[
    {
        "user_input": "第一届奥运会是什么时候举行的?",
        "retrieved_contexts": [
            "第一届现代奥运会于1896年4月6日到4月15日在希腊雅典举行。"
        ],
        "response": "第一届现代奥运会于1896年4月6日举行。",
        "reference": "第一届现代奥运会于1896年4月6日在希腊雅典开幕。"
    },
    {
        "user_input": "哪位运动员赢得了最多的奥运金牌?",
        "retrieved_contexts": [
            "迈克尔·菲尔普斯是历史上获得奥运金牌最多的运动员,他共赢得了23枚奥运金牌。"
        ],
        "response": "迈克尔·菲尔普斯赢得了最多的奥运金牌。",
        "reference": "迈克尔·菲尔普斯是获得奥运金牌最多的运动员,共赢得23枚金牌。"
    }
] 转成的txt

import json
# 从文件中读取 JSON 数据
with open("./data/test_data.json", "r") as file:
    data = json.load(file)
with open("./data/test_data.txt", "w") as file:
    json.dump(data, file, indent=4)  # 使用 json.dump 写入文件,格式化

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

[nltk_data] Downloading package punkt_tab to [/Users/wendy/nltk_data...](http://localhost:8889/Users/wendy/nltk_data...)
[nltk_data]   Package punkt_tab is already up-to-date!
2025-01-14 17:59:21,556 - evalscope - INFO - Args: Task config is provided with dictionary type.
2025-01-14 17:59:21,576 - evalscope - INFO - Dump task config to [./outputs/20250114_175921/configs/task_config_387b02.yaml](http://localhost:8889/outputs/20250114_175921/configs/task_config_387b02.yaml)
2025-01-14 17:59:21,588 - evalscope - INFO - {
    "model": null,
    "model_id": null,
    "model_args": {
        "revision": "master",
        "precision": "torch.float16",
        "device": "auto"
    },
    "template_type": null,
    "chat_template": null,
    "datasets": [],
    "dataset_args": {},
    "dataset_dir": "[/Users/wendy/.cache/modelscope/datasets](http://localhost:8889/Users/wendy/.cache/modelscope/datasets)",
    "dataset_hub": "modelscope",
    "generation_config": {
        "max_length": 2048,
        "max_new_tokens": 512,
        "do_sample": false,
        "top_k": 50,
        "top_p": 1.0,
        "temperature": 1.0
    },
    "eval_type": "checkpoint",
    "eval_backend": "RAGEval",
    "eval_config": {
        "tool": "RAGAS",
        "testset_generation": {
            "docs": [
                "[./data/test_data.txt](http://localhost:8889/data/test_data.txt)"
            ],
            "test_size": 2,
            "output_file": "outputs[/testset.json](http://localhost:8889/testset.json)",
            "knowledge_graph": "outputs[/knowledge_graph.json](http://localhost:8889/knowledge_graph.json)",
            "generator_llm": {
                "model_name": "qwen2.5:14B-Instruct",
                "api_base": "http://localhost:11434/v1/",
                "api_key": "ollama"
            },
            "embeddings": {
                "model_name_or_path": "AI-ModelScope/bge-large-zh"
            },
            "language": "chinese"
        }
    },
    "stage": "all",
    "limit": null,
    "mem_cache": false,
    "use_cache": null,
    "work_dir": "[./outputs/20250114_175921](http://localhost:8889/outputs/20250114_175921)",
    "outputs": null,
    "debug": false,
    "dry_run": false,
    "seed": 42,
    "api_url": null,
    "api_key": "EMPTY"
}
2025-01-14 17:59:21,593 - evalscope - INFO - Check `ragas` Installed
2025-01-14 17:59:21,597 - unstructured - WARNING - libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.
2025-01-14 17:59:21,640 - evalscope - INFO - Loading model AI-ModelScope[/bge-large-zh](http://localhost:8889/bge-large-zh) from modelscope
Downloading Model to directory: /Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh
2025-01-14 17:59:22,623 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-01-14 17:59:23,084 - sentence_transformers.SentenceTransformer - INFO - Use pytorch device_name: mps
2025-01-14 17:59:23,086 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: [/Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh](http://localhost:8889/Users/wendy/.cache/modelscope/hub/AI-ModelScope/bge-large-zh)
2025-01-14 17:59:25,952 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,953 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/HeadlinesExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/HeadlinesExtractor)
2025-01-14 17:59:25,954 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,955 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SummaryExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SummaryExtractor)
2025-01-14 17:59:25,956 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,956 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/ThemesExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/ThemesExtractor)
2025-01-14 17:59:25,957 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,959 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/NERExtractor](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/NERExtractor)
2025-01-14 17:59:25,962 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:25,964 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter)
2025-01-14 17:59:25,965 - evalscope - INFO - Translate prompts finished
2025-01-14 17:59:25,966 - evalscope - INFO - Loading knowledge graph from outputs[/knowledge_graph.json](http://localhost:8889/knowledge_graph.json)
Generating personas: 100%
 2/2 [00:10<00:00,  4.56s/it]
2025-01-14 17:59:36,457 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:36,866 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:36,894 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,897 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,899 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SingleHopSpecificQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/SingleHopSpecificQuerySynthesizer)
2025-01-14 17:59:36,909 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,914 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,918 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,919 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopAbstractQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopAbstractQuerySynthesizer)
2025-01-14 17:59:36,927 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,931 - ragas.prompt.pydantic_prompt - WARNING - Loaded prompt hash does not match the saved hash.
2025-01-14 17:59:36,934 - evalscope - INFO - Load existing prompts from [/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopSpecificQuerySynthesizer](http://localhost:8889/opt/anaconda3/envs/llm_eval/lib/python3.12/site-packages/evalscope/backend/rag_eval/ragas/prompts/chinese/MultiHopSpecificQuerySynthesizer)
2025-01-14 17:59:36,936 - evalscope - INFO - Translate prompts finished
2025-01-14 17:59:36,939 - ragas.testset.synthesizers.multi_hop.abstract - INFO - found 0 clusters
2025-01-14 17:59:36,939 - ragas.testset.synthesizers.multi_hop.specific - INFO - found 0 clusters
Generating Scenarios: 100%
 1/1 [00:17<00:00, 17.33s/it]
2025-01-14 17:59:46,380 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2025-01-14 17:59:54,255 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Generating Samples: 
 0/0 [00:00<?, ?it/s]
Generating Answers: 0it [00:00, ?it[/s](http://localhost:8889/s)]
CPU times: user 994 ms, sys: 1.17 s, total: 2.16 s
Wall time: 33.5 s

{}

运行环境 / Runtime Environment

  • 操作系统 / Operating System:

    • Windows
    • [yes ] macOS
    • Ubuntu
  • Python版本 / Python Version:

    • 3.11
    • 3.10
    • 3.9
    • [ yes] 3.12

其他信息 / Additional Information

能否提供一个可供运行的txt文件示例,谢谢

如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.

@Yunnglin
Copy link
Collaborator

输入的文档要长一些,不然生成不了问答对

@Yunnglin
Copy link
Collaborator

看你提供的代码,是想尝试生成问答对吗,还是只用来做RAG评测

@Yunnglin Yunnglin added question Further information is requested rageval labels Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested rageval
Projects
None yet
Development

No branches or pull requests

2 participants