This repository contains the official implementation of our EMNLP 2025 Main paper:
Evaluating LLM-Generated Diagrams via Graphs
Chumeng Liang, Jiaxuan You
University of Illinois Urbana-Champaign
Diagrams play a central role in research papers for conveying ideas, yet they are often complex and labor-intensive to create. We propose DiagramEval, a novel evaluation metric designed to assess demonstration diagrams generated by LLMs. DiagramEval conceptualizes diagrams as graphs, treating text elements as nodes and their connections as directed edges, and evaluates diagram quality using two new groups of metrics: node alignment and path alignment.
This repository provides:
- Diagram Data Collection: Run one command to collect diagrams and paper context according to a list of paper titles.
 - Diagram Generation (Demo): A demo LLM workflow to generate diagrams from paper context.
 - Diagram-Eval Metrics: Our proposed metrics to evaluate generated diagrams against reference including paper context (.txt) and diagram images (.png).
 
conda create -n diagram python=3.10
conda activate diagram
pip install -r requirements.txtCreate configs/key.yaml with your API keys:
openai_api_key: "your-openai-key"
google_api_key: "your-google-key"  # For Gemini image generation
claude_api_key: "your-claude-key"
gemini_api_key: "your-gemini-key"
nvidia_api_key: "your-nvidia-key"Format:
[
  {
    "title": "Taming Transformers for High-Resolution Image Synthesis"
  },
  {
    "title": "Scalable Diffusion Models with Transformers"
  },
  ...
]Run Our Data Collection Pipeline:
python script/prepare_inputs.py \
  --paper-list [PATH_TO_PAPER_LIST] \
  --crawl-output [OUTPUT_DIR]/crawled \
  --extract-output [OUTPUT_DIR]/extracted \
  --config-path configs/llm_config.yamlBasic generation:
python script/generate_diagram.py \
  [PATH_TO_PAPER_CONTEXT] \
  --output-dir [OUTPUT_DIR]/generatedWith planning (focuses on methodology):
python script/generate_diagram.py \
  [PATH_TO_PAPER_CONTEXT] \
  --output-dir [OUTPUT_DIR]/generated \
  --use-plannerEvaluate against reference:
python script/evaluate_diagram.py \
  [PATH_TO_GENERATED_DIAGRAM_FILE] \
  [PATH_TO_REFERENCE_FILE] \  # .txt or .png
  --output-dir [OUTPUT_DIR]/generatedOutput:
[OUTPUT_DIR]/generated/[ARXIV_ID]_metrics.json- Evaluation metrics[OUTPUT_DIR]/generated/[ARXIV_ID]_metrics.md- Human-readable report
Output file structure:
[OUTPUT_DIR]
├── crawled/
│   └── 2212.09748/
│       ├── 2212.09748.pdf: PDF file from Arxiv
│       └── 2212.09748.tar.gz: latex file package from Arxiv
├── extracted/
│   └── 2212.09748/
│       ├── 2212.09748_text.txt: reference paper context
│       └── 2212.09748_diagrams/
│           └── figure_overview.png: reference paper diagram
└── generated/
    ├── 2212.09748_generated_metrics.json: metric report in .json
    ├── 2212.09748_generated_metrics.md: metric report in .md
    └── 2212.09748_generated.png: generated diagram
Please check 'example/' for an example input and output file structure.
All LLM/MLLM configurations are managed in configs/llm_config.yaml:
text_graph_extraction: Extract graph structure from textimage_graph_extraction: Extract graph structure from diagram imagesnode_alignment: Align nodes between generated and reference graphsdiagram_selection: Select overview diagrams from paperslayout_planner: Plan diagram layout and componentsdiagram_generator: Generate diagrams via Google Gemini API
Modify model names, temperatures, and other parameters as needed.
DiagramEval conceptualizes diagrams as directed graphs:
- Nodes: Text elements or components in the diagram
 - Edges: Directional connections (arrows) between components
 
Measures how well the components in generated and reference diagrams correspond:
- Precision: Proportion of generated nodes that match reference nodes
 - Recall: Proportion of reference nodes covered by generated nodes
 - F1 Score: Harmonic mean of precision and recall
 
Evaluates the structural relationships between components:
- Precision: Proportion of generated paths that match reference paths
 - Recall: Proportion of reference paths covered by generated paths
 - F1 Score: Harmonic mean of precision and recall
 
Edit evaluate_figure_captions in utils/extract_text_diagram_from_paper.py to improve the diagram extraction accuracy.
Edit _create_layout_plan in generate/workflow.py to customize the planning prompt.
Edit _request_diagram in generate/workflow.py to customize the diagram generation prompt.
Run utils/crawl_paper.py to only crawl latex files and PDF of the paper from Arxiv.
Run utils/extract_text_diagram_from_paper.py to manually pick diagrams from latex files.
This project is licensed under the MIT License - see the LICENSE file for details.
