A Neo4j GraphRAG Demonstration Project
This project demonstrates Neo4j Graph Database combined with GraphRAG (Graph Retrieval-Augmented Generation) as a linked knowledge base for intelligent legal information retrieval. It showcases how graph technology can structure complex legal documents to serve different user journeys with semantic search capabilities.
Graph Database Architecture:
- ✅ Hierarchical legal document structure in Neo4j
- ✅ 61,945 nodes with 63,722 relationships
- ✅ Complex multi-level connections (documents → structures → norms → chunks)
- ✅ Amendment tracking with temporal relationships
GraphRAG Implementation:
- ✅ Text chunking optimized for legal content
- ✅ 41,781 semantic embeddings for vector search
- ✅ Hybrid retrieval (graph traversal + semantic similarity)
- ✅ Context-aware information retrieval
Real-World Use Cases:
- ✅ 20 documented user journeys
- ✅ 100% test coverage with production queries
- ✅ Multiple user personas (case workers, policy analysts, lawyers)
| Component | Count | Status |
|---|---|---|
| Legal Documents | 13 SGB volumes | ✅ Complete |
| Legal Norms | 4,203 | ✅ Complete |
| Text Chunks | 41,781 | ✅ Complete |
| Vector Embeddings | 41,781 (100%) | ✅ Complete |
| Amendments | 21 with full metadata | ✅ Complete |
| BGBl References | 13 | ✅ Complete |
| User Journeys | 20 (all passing) | ✅ Validated |
Performance: Average query time < 5ms | Throughput: 79.59 norms/sec

Hierarchical structure of German Social Law in Neo4j - showing documents, structural units, and legal norms
- Neo4j 5.x (Community or Enterprise)
- Python 3.11+
- 8GB RAM minimum
# Clone and setup
git clone https://github.com/ma3u/Sozialrecht_RAG.git
cd Sozialrecht_RAG
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env with your Neo4j credentials
# Start Neo4j
docker-compose up -d
# or: neo4j start# Full deployment (all 13 SGB volumes with amendments)
python scripts/deploy_all_sgb_volumes.pyThis imports:
- 4,203 legal norms across 13 social law books
- 41,781 text chunks with embeddings
- 21 amendments with historical tracking
- Complete graph structure
# Run validation
python scripts/validate_import.py
# Test use cases (all 20 should pass)
python scripts/evaluate_sachbearbeiter_use_cases.pyGetting Started:
- 📘 Complete Documentation Index - Full guide to all docs
- 📘 GraphRAG Learnings - Implementation insights and best practices
Amendment Tracking:
- 📙 Deployment Report - Full deployment results
- 📙 Amendment Analysis - Coverage analysis
User Journeys:
- 👤 German User Journeys - 20 case worker scenarios (German)
- 👤 Amendment User Journeys - 6 amendment-specific scenarios
- 👤 Use Case Validation - Test results for all 20 journeys
- 📊 Deployment Report - Latest deployment statistics
- 📊 Amendment Analysis - Amendment coverage analysis
Ready-to-use Cypher queries organized by purpose:
- 01_graph_statistics.cypher - Database metrics
- 03_sachbearbeiter_workflows.cypher - Case worker queries
- 05_rag_sachbearbeiter_queries.cypher - RAG-optimized queries
See Cypher Query Guide for usage instructions.
This project demonstrates GraphRAG serving different user personas with tailored information retrieval:
Scenario: Daily benefit administration decisions
Example Journey: "Check eligibility for housing benefits"
MATCH (doc:LegalDocument {sgb_nummer: "II"})
-[:HAS_STRUCTURE|CONTAINS_NORM*1..3]->(norm:LegalNorm)
WHERE norm.paragraph_nummer = "22"
RETURN norm.enbez, norm.titel, norm.content_textQueries: Basic facts, paragraph lookup, benefit calculations
Documentation: BENUTZER_JOURNEYS_DE.md (German, 20 scenarios)
Scenario: Historical law changes and impact analysis
Example Journey: "When did § 20 SGB II last change?"
MATCH (norm:Norm {paragraph_nummer: "20"})
-[:HAS_AMENDMENT]->(a:Amendment)
RETURN a.amendment_date, a.raw_text
ORDER BY a.amendment_date DESC
LIMIT 1Queries: Amendment timelines, law impact analysis, BGBl citations
Documentation: AMENDMENT_USER_JOURNEYS.md (6 scenarios)
Scenario: Cross-law analysis and coverage assessment
Example Journey: "Find all norms affected by a specific law change"
MATCH (a:Amendment {gesetz_ref: "G v. 23.10.2024 I Nr. 323"})
<-[:HAS_AMENDMENT]-(norm:Norm)
RETURN norm.paragraph_nummer, norm.titel, a.amendment_dateQueries: Cross-SGB analysis, change impact, statistical aggregations
Documentation: Query Library (20+ specialized queries)
Scenario: Finding relevant content without knowing exact paragraphs
Example: "regulations about single parents"
CALL db.index.vector.queryNodes('chunk_embeddings', 5, $embedding)
YIELD node, score
MATCH (node)<-[:HAS_CHUNK]-(norm:LegalNorm)
RETURN norm.paragraph_nummer, node.text, scoreFeatures: Vector similarity search, context-aware retrieval
Documentation: GraphRAG Status
LegalDocument (13 SGBs)
├── HAS_STRUCTURE ─────> StructuralUnit (458)
│ └── CONTAINS_NORM ─> LegalNorm (4,203)
│ ├── HAS_CHUNK ────────> Chunk (41,781)
│ │ └── embedding [768-dim vector]
│ ├── HAS_AMENDMENT ────> Amendment (21)
│ │ └── SUPERSEDED_BY ─> Amendment
│ └── HAS_FUSSNOTE ─────> Fussnote (16)
└── PUBLISHED_IN ──────────────> BGBl (13)
| Node Type | Purpose | Count | Example Properties |
|---|---|---|---|
LegalDocument |
SGB volumes | 13 | sgb_nummer, jurabk, lange_titel |
LegalNorm |
Legal paragraphs | 4,203 | paragraph_nummer, titel, content_text |
Chunk |
Text segments for RAG | 41,781 | text, embedding[768], paragraph_context |
Amendment |
Historical changes | 21 | amendment_date, artikel, gesetz_ref |
BGBl |
Official gazette refs | 13 | year, page, full_reference |
| Relationship | Purpose | Count |
|---|---|---|
CONTAINS_NORM |
Document → Norm | 4,203 |
HAS_CHUNK |
Norm → Chunk | 41,781 |
HAS_AMENDMENT |
Norm → Amendment | 21 |
SUPERSEDED_BY |
Amendment timeline | 10 |
PUBLISHED_IN |
Document → BGBl | 13 |
Total: 61,945 nodes | 63,722 relationships
# Unit tests (amendment parser)
python tests/test_amendment_parser.py
# Result: 33/33 passing (100%)
# Integration tests (use cases)
python scripts/evaluate_sachbearbeiter_use_cases.py
# Result: 20/20 passing (100%)
# Data validation
python scripts/validate_import.py
# Checks: data quality, relationships, indexes- ✅ Unit Tests: 33/33 (100%) - Parser functionality
- ✅ Integration Tests: 20/20 (100%) - End-to-end user journeys
- ✅ Data Quality: 0 orphaned nodes, all relationships valid
- ✅ Performance: < 1s query time (target: < 2s)
Hierarchical Structure:
- Multi-level document organization (Document → Structure → Norm)
- Efficient traversal with relationship types
- Optimized indexes for fast lookup
Temporal Relationships:
- Amendment chains with SUPERSEDED_BY
- Historical state queries (law at specific date)
- Version tracking through Fussnoten
Hybrid Retrieval:
- Graph traversal for structural navigation
- Vector similarity for semantic search
- Combined approach for context-aware results
Chunking Strategy:
- Legal-optimized text segmentation
- Context preservation with paragraph_context
- 768-dimensional embeddings (Azure OpenAI)
Performance:
- 7 optimized indexes for query speed
- Batch import (79.59 norms/sec)
- Sub-second query response times
Data Quality:
- 100% date extraction accuracy
- 0 orphaned nodes
- Comprehensive validation
Monitoring:
- Automated deployment reports
- Data quality checks
- Performance metrics tracking
Sozialrecht_RAG/
├── README.md # This file
├── PHASE_2_*.md # Phase 2 documentation (4 files)
├── docs/ # Documentation
│ ├── DOCUMENTATION_INDEX.md # Complete guide
│ ├── BENUTZER_JOURNEYS_DE.md # User journeys (German)
│ ├── AMENDMENT_USER_JOURNEYS.md # Amendment scenarios
│ ├── reports/ # Deployment reports
│ └── archive/ # Historical docs
├── src/ # Source code
│ ├── amendment_parser.py # Amendment extraction
│ ├── xml_to_neo4j_enhanced.py # Import with amendments
│ ├── queries/ # Query library
│ └── xml_legal_parser.py # XML parsing
├── scripts/ # Executable scripts
│ ├── deploy_all_sgb_volumes.py # Full deployment
│ ├── validate_import.py # Validation
│ ├── evaluate_sachbearbeiter_use_cases.py # Testing
│ └── archive/ # Historical scripts
├── tests/ # Unit tests
│ └── test_amendment_parser.py # Parser tests
├── cypher/ # Cypher query collections
│ ├── 01_graph_statistics.cypher
│ ├── 03_sachbearbeiter_workflows.cypher
│ └── 05_rag_sachbearbeiter_queries.cypher
└── xml_cache/ # Source XML files
-
Start Here: Complete Documentation Index
- Project overview and all documentation
- Quick understanding of capabilities
-
GraphRAG Concepts: NEO4J_GRAPHRAG_LEARNINGS.md
- How graph + RAG work together
- Implementation insights
-
User Journeys: BENUTZER_JOURNEYS_DE.md
- Real-world usage examples (German)
- Query patterns for different personas
-
Amendment Journeys: AMENDMENT_USER_JOURNEYS.md
- Amendment-specific scenarios
- Historical tracking use cases
This project serves as a template for building domain-specific GraphRAG systems:
Adapt For Your Domain:
- Replace legal XML with your data source
- Adjust chunking strategy for your content
- Modify schema for your relationships
- Create user journeys for your users
Key Patterns to Reuse:
- Hierarchical document structure
- Chunk-based RAG implementation
- Hybrid retrieval (graph + vector)
- Amendment/version tracking approach
- Automated deployment and validation
- German Social Law (SGB): www.gesetze-im-internet.de
- Official government-provided XML files
- Updated regularly with amendments
- Neo4j 5.x: Graph database
- Python 3.11+: Implementation language
- Azure OpenAI: Embeddings generation
- sentence-transformers: Local embeddings (alternative)
- lxml: XML parsing
- Complete Guide: docs/DOCUMENTATION_INDEX.md
- User Journeys: docs/BENUTZER_JOURNEYS_DE.md
- Query Library: src/queries/amendment_queries.py
# Validate your installation
python scripts/validate_import.py
# Test specific use case
python scripts/evaluate_sachbearbeiter_use_cases.py- Check docs/DOCUMENTATION_INDEX.md § Configuration
- Review Deployment Report for validation patterns
- See scripts/README.md for script documentation
This project is provided as-is for educational and demonstration purposes.
Data: German social law texts are public domain (official government documents)
Code: Available for study and adaptation
Use Case: Demonstration of Neo4j GraphRAG capabilities
Data Source: www.gesetze-im-internet.de - German Federal Ministry of Justice
Technology: Neo4j Graph Database and Azure OpenAI
Purpose: GraphRAG demonstration and legal knowledge graph research
Version: 2.4 (Amendment Features Complete)
Last Updated: November 3, 2025
Status: ✅ Production Ready - All 20 use cases validated