Name	Name	Last commit message	Last commit date
Latest commit History 64 Commits
Fachliche_Weisungen	Fachliche_Weisungen
Gesetze	Gesetze
Metadaten	Metadaten
Rundschreiben_BMAS	Rundschreiben_BMAS
archive	archive
config	config
cypher	cypher
dashboard	dashboard
docs	docs
logs	logs
processes	processes
reports	reports
scripts	scripts
src	src
temp_data	temp_data
tests	tests
xml_cache	xml_cache
.env.example	.env.example
.gitignore	.gitignore
README.md	README.md
docker-compose.yml	docker-compose.yml
requirements.txt	requirements.txt

Sozialrecht RAG - German Social Law Knowledge Graph

A Neo4j GraphRAG Demonstration Project

🎯 Project Purpose

This project demonstrates Neo4j Graph Database combined with GraphRAG (Graph Retrieval-Augmented Generation) as a linked knowledge base for intelligent legal information retrieval. It showcases how graph technology can structure complex legal documents to serve different user journeys with semantic search capabilities.

What This Demonstrates

Graph Database Architecture:

✅ Hierarchical legal document structure in Neo4j
✅ 61,945 nodes with 63,722 relationships
✅ Complex multi-level connections (documents → structures → norms → chunks)
✅ Amendment tracking with temporal relationships

GraphRAG Implementation:

✅ Text chunking optimized for legal content
✅ 41,781 semantic embeddings for vector search
✅ Hybrid retrieval (graph traversal + semantic similarity)
✅ Context-aware information retrieval

Real-World Use Cases:

✅ 20 documented user journeys
✅ 100% test coverage with production queries
✅ Multiple user personas (case workers, policy analysts, lawyers)

📊 Current Status

Component	Count	Status
Legal Documents	13 SGB volumes	✅ Complete
Legal Norms	4,203	✅ Complete
Text Chunks	41,781	✅ Complete
Vector Embeddings	41,781 (100%)	✅ Complete
Amendments	21 with full metadata	✅ Complete
BGBl References	13	✅ Complete
User Journeys	20 (all passing)	✅ Validated

Performance: Average query time < 5ms | Throughput: 79.59 norms/sec

Graph Visualization

Hierarchical structure of German Social Law in Neo4j - showing documents, structural units, and legal norms

🚀 Quick Start

Prerequisites

Neo4j 5.x (Community or Enterprise)
Python 3.11+
8GB RAM minimum

Installation

# Clone and setup
git clone https://github.com/ma3u/Sozialrecht_RAG.git
cd Sozialrecht_RAG
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your Neo4j credentials

# Start Neo4j
docker-compose up -d
# or: neo4j start

Deploy Data

# Full deployment (all 13 SGB volumes with amendments)
python scripts/deploy_all_sgb_volumes.py

This imports:

4,203 legal norms across 13 social law books
41,781 text chunks with embeddings
21 amendments with historical tracking
Complete graph structure

Verify

# Run validation
python scripts/validate_import.py

# Test use cases (all 20 should pass)
python scripts/evaluate_sachbearbeiter_use_cases.py

📖 Documentation

Core Documentation

Getting Started:

📘 Complete Documentation Index - Full guide to all docs
📘 GraphRAG Learnings - Implementation insights and best practices

Amendment Tracking:

📙 Deployment Report - Full deployment results
📙 Amendment Analysis - Coverage analysis

User Journeys:

👤 German User Journeys - 20 case worker scenarios (German)
👤 Amendment User Journeys - 6 amendment-specific scenarios
👤 Use Case Validation - Test results for all 20 journeys

Reports

📊 Deployment Report - Latest deployment statistics
📊 Amendment Analysis - Amendment coverage analysis

Query Collections

Ready-to-use Cypher queries organized by purpose:

01_graph_statistics.cypher - Database metrics
03_sachbearbeiter_workflows.cypher - Case worker queries
05_rag_sachbearbeiter_queries.cypher - RAG-optimized queries

See Cypher Query Guide for usage instructions.

🎭 User Journeys

This project demonstrates GraphRAG serving different user personas with tailored information retrieval:

1. Case Workers (Sachbearbeiter)

Scenario: Daily benefit administration decisions

Example Journey: "Check eligibility for housing benefits"

MATCH (doc:LegalDocument {sgb_nummer: "II"})
  -[:HAS_STRUCTURE|CONTAINS_NORM*1..3]->(norm:LegalNorm)
WHERE norm.paragraph_nummer = "22"
RETURN norm.enbez, norm.titel, norm.content_text

Queries: Basic facts, paragraph lookup, benefit calculations
Documentation: BENUTZER_JOURNEYS_DE.md (German, 20 scenarios)

2. Legal Researchers

Scenario: Historical law changes and impact analysis

Example Journey: "When did § 20 SGB II last change?"

MATCH (norm:Norm {paragraph_nummer: "20"})
  -[:HAS_AMENDMENT]->(a:Amendment)
RETURN a.amendment_date, a.raw_text
ORDER BY a.amendment_date DESC
LIMIT 1

Queries: Amendment timelines, law impact analysis, BGBl citations
Documentation: AMENDMENT_USER_JOURNEYS.md (6 scenarios)

3. Policy Analysts

Scenario: Cross-law analysis and coverage assessment

Example Journey: "Find all norms affected by a specific law change"

MATCH (a:Amendment {gesetz_ref: "G v. 23.10.2024 I Nr. 323"})
  <-[:HAS_AMENDMENT]-(norm:Norm)
RETURN norm.paragraph_nummer, norm.titel, a.amendment_date

Queries: Cross-SGB analysis, change impact, statistical aggregations
Documentation: Query Library (20+ specialized queries)

4. Semantic Search Users

Scenario: Finding relevant content without knowing exact paragraphs

Example: "regulations about single parents"

CALL db.index.vector.queryNodes('chunk_embeddings', 5, $embedding)
YIELD node, score
MATCH (node)<-[:HAS_CHUNK]-(norm:LegalNorm)
RETURN norm.paragraph_nummer, node.text, score

Features: Vector similarity search, context-aware retrieval
Documentation: GraphRAG Status

🏗️ Graph Architecture

Schema Overview

LegalDocument (13 SGBs)
├── HAS_STRUCTURE ─────> StructuralUnit (458)
│   └── CONTAINS_NORM ─> LegalNorm (4,203)
│       ├── HAS_CHUNK ────────> Chunk (41,781)
│       │   └── embedding [768-dim vector]
│       ├── HAS_AMENDMENT ────> Amendment (21)
│       │   └── SUPERSEDED_BY ─> Amendment
│       └── HAS_FUSSNOTE ─────> Fussnote (16)
└── PUBLISHED_IN ──────────────> BGBl (13)

Key Node Types

Node Type	Purpose	Count	Example Properties
`LegalDocument`	SGB volumes	13	sgb_nummer, jurabk, lange_titel
`LegalNorm`	Legal paragraphs	4,203	paragraph_nummer, titel, content_text
`Chunk`	Text segments for RAG	41,781	text, embedding[768], paragraph_context
`Amendment`	Historical changes	21	amendment_date, artikel, gesetz_ref
`BGBl`	Official gazette refs	13	year, page, full_reference

Relationship Types

Relationship	Purpose	Count
`CONTAINS_NORM`	Document → Norm	4,203
`HAS_CHUNK`	Norm → Chunk	41,781
`HAS_AMENDMENT`	Norm → Amendment	21
`SUPERSEDED_BY`	Amendment timeline	10
`PUBLISHED_IN`	Document → BGBl	13

Total: 61,945 nodes | 63,722 relationships

🧪 Testing & Validation

Run All Tests

# Unit tests (amendment parser)
python tests/test_amendment_parser.py
# Result: 33/33 passing (100%)

# Integration tests (use cases)
python scripts/evaluate_sachbearbeiter_use_cases.py
# Result: 20/20 passing (100%)

# Data validation
python scripts/validate_import.py
# Checks: data quality, relationships, indexes

Test Coverage

✅ Unit Tests: 33/33 (100%) - Parser functionality
✅ Integration Tests: 20/20 (100%) - End-to-end user journeys
✅ Data Quality: 0 orphaned nodes, all relationships valid
✅ Performance: < 1s query time (target: < 2s)

💡 Key Features Demonstrated

1. Graph Database Capabilities

Hierarchical Structure:

Multi-level document organization (Document → Structure → Norm)
Efficient traversal with relationship types
Optimized indexes for fast lookup

Temporal Relationships:

Amendment chains with SUPERSEDED_BY
Historical state queries (law at specific date)
Version tracking through Fussnoten

2. GraphRAG Implementation

Hybrid Retrieval:

Graph traversal for structural navigation
Vector similarity for semantic search
Combined approach for context-aware results

Chunking Strategy:

Legal-optimized text segmentation
Context preservation with paragraph_context
768-dimensional embeddings (Azure OpenAI)

3. Production-Ready Features

Performance:

7 optimized indexes for query speed
Batch import (79.59 norms/sec)
Sub-second query response times

Data Quality:

100% date extraction accuracy
0 orphaned nodes
Comprehensive validation

Monitoring:

Automated deployment reports
Data quality checks
Performance metrics tracking

📁 Project Structure

Sozialrecht_RAG/
├── README.md                          # This file
├── PHASE_2_*.md                       # Phase 2 documentation (4 files)
├── docs/                              # Documentation
│   ├── DOCUMENTATION_INDEX.md         # Complete guide
│   ├── BENUTZER_JOURNEYS_DE.md       # User journeys (German)
│   ├── AMENDMENT_USER_JOURNEYS.md    # Amendment scenarios
│   ├── reports/                       # Deployment reports
│   └── archive/                       # Historical docs
├── src/                               # Source code
│   ├── amendment_parser.py            # Amendment extraction
│   ├── xml_to_neo4j_enhanced.py      # Import with amendments
│   ├── queries/                       # Query library
│   └── xml_legal_parser.py           # XML parsing
├── scripts/                           # Executable scripts
│   ├── deploy_all_sgb_volumes.py     # Full deployment
│   ├── validate_import.py            # Validation
│   ├── evaluate_sachbearbeiter_use_cases.py  # Testing
│   └── archive/                       # Historical scripts
├── tests/                             # Unit tests
│   └── test_amendment_parser.py      # Parser tests
├── cypher/                            # Cypher query collections
│   ├── 01_graph_statistics.cypher
│   ├── 03_sachbearbeiter_workflows.cypher
│   └── 05_rag_sachbearbeiter_queries.cypher
└── xml_cache/                         # Source XML files

🎓 Learning Resources

Understanding This Project

Start Here: Complete Documentation Index
- Project overview and all documentation
- Quick understanding of capabilities
GraphRAG Concepts: NEO4J_GRAPHRAG_LEARNINGS.md
- How graph + RAG work together
- Implementation insights
User Journeys: BENUTZER_JOURNEYS_DE.md
- Real-world usage examples (German)
- Query patterns for different personas
Amendment Journeys: AMENDMENT_USER_JOURNEYS.md
- Amendment-specific scenarios
- Historical tracking use cases

Running Your Own GraphRAG

This project serves as a template for building domain-specific GraphRAG systems:

Adapt For Your Domain:

Replace legal XML with your data source
Adjust chunking strategy for your content
Modify schema for your relationships
Create user journeys for your users

Key Patterns to Reuse:

Hierarchical document structure
Chunk-based RAG implementation
Hybrid retrieval (graph + vector)
Amendment/version tracking approach
Automated deployment and validation

🔗 External Resources

Data Source

German Social Law (SGB): www.gesetze-im-internet.de
Official government-provided XML files
Updated regularly with amendments

Technologies Used

Neo4j 5.x: Graph database
Python 3.11+: Implementation language
Azure OpenAI: Embeddings generation
sentence-transformers: Local embeddings (alternative)
lxml: XML parsing

📞 Support

Documentation

Complete Guide: docs/DOCUMENTATION_INDEX.md
User Journeys: docs/BENUTZER_JOURNEYS_DE.md
Query Library: src/queries/amendment_queries.py

Testing

# Validate your installation
python scripts/validate_import.py

# Test specific use case
python scripts/evaluate_sachbearbeiter_use_cases.py

Troubleshooting

Check docs/DOCUMENTATION_INDEX.md § Configuration
Review Deployment Report for validation patterns
See scripts/README.md for script documentation

📄 License

This project is provided as-is for educational and demonstration purposes.

Data: German social law texts are public domain (official government documents)
Code: Available for study and adaptation
Use Case: Demonstration of Neo4j GraphRAG capabilities

🙏 Acknowledgments

Data Source: www.gesetze-im-internet.de - German Federal Ministry of Justice
Technology: Neo4j Graph Database and Azure OpenAI
Purpose: GraphRAG demonstration and legal knowledge graph research

Version: 2.4 (Amendment Features Complete)
Last Updated: November 3, 2025
Status: ✅ Production Ready - All 20 use cases validated

ma3u/Sozialrecht_RAG

Folders and files

Latest commit

History

Repository files navigation