Summary: This README provides a comprehensive overview of a hybrid ancient document restoration framework. The project emphasizes a low-cost, verifiable approach that combines classical mathematical algorithms with targeted Deep Learning (DL) assistance, avoiding the "black-box" nature of end-to-end AI generation.
This project presents a specialized pipeline for the restoration of degraded and physically distorted ancient documents. Unlike contemporary "End-to-End" AI models that generate images from scratch—often leading to historical hallucinations—this framework utilizes AI strictly for structural feature extraction.
The core philosophy is AI-as-an-Assistant:
- Minimal Computational Cost: Only a lightweight model is used for mask detection.
- Geometric Integrity: Restoration is handled by deterministic mathematical transformations.
- Verifiability: Each stage of the pipeline produces measurable, traceable results similar to a scientific verification study.
Design Philosophy: Why not End-to-End AI?
-
Most modern restoration projects use Generative AI (GANs/Diffusion) which often "hallucinates" or creates fake details in ancient scripts. Our project uses a Geometric Constraint Approach:
-
AI for Perception: UNet only identifies where the text is.
-
Math for Transformation: TPS and Polynomial fitting ensure the original pixels are simply moved back to their rightful place, preserving 100% historical authenticity.
-
The restoration process is executed through four distinct, sequential stages:
graph TD
A[Degraded Image] --> B[Stage 1: Deskewing]
B --> C[Stage 2: Dewarping]
subgraph "AI-Guided Geometry"
C --> C1[UNet Masking]
C1 --> C2[Skeletonization]
C2 --> C3[Curve Fitting]
C3 --> C4[TPS Warp]
end
C4 --> D[Stage 3: Forensic Analysis]
D --> E[Stage 4: Binarization]
E --> F[Restored Document]
style B fill:#ffffff,stroke:#333,stroke-width:2px
style C fill:#ffffff,stroke:#333,stroke-width:2px
style D fill:#ffffff,stroke:#333,stroke-width:2px
style E fill:#ffffff,stroke:#333,stroke-width:2px
To correct global rotation, the system employs the Probabilistic Hough Transform. By detecting the dominant orientations of text-line segments, the algorithm calculates the precise skew angle and performs a compensatory rotation to align the document to a horizontal baseline.
This stage rectifies non-linear distortions (e.g., page curls and folds) using a multi-step geometric process:
- Mask Detection: A Deep Learning model (U-Net) identifies the precise pixel-area of text lines.
- Skeletonization: The detected masks are reduced to one-pixel wide centerlines (skeletons).
- Curve Fitting: Polynomial or spline-based functions are fitted to these skeletons to model the physical warp of the paper.
- TPS Dewarp: Thin Plate Spline (TPS) transformation is applied to warp the entire image back into a flat, rectified plane based on the fitted curves.
To isolate text from stains, aging artifacts, and uneven lighting:
- Division Normalization (DN): Estimating the illumination layer and dividing the original image by it to achieve a uniform background.
- Enhancement: Implementation of CLAHE (Contrast Limited Adaptive Histogram Equalization) or ZCA Whitening (Zero-phase Component Analysis) to sharpen faint ink traces and decorrelate noise.
The final stage converts the image to high-contrast black and white:
- The system utilizes the AI-generated Mask from Stage 2 as a spatial filter.
- Content Preservation: Pixels within the mask boundaries undergo adaptive binarization to retain stroke details.
- Background Cleaning: All pixels outside the mask are programmatically set to pure white (), ensuring a perfectly clean output for OCR engines.
The AI component is trained solely to identify text-line masks, significantly reducing the required training data and time compared to generative models.
| Metric | Value |
|---|---|
| Training Loss | 0.0449 |
| Validation Loss | 0.0636 |
| Mask mIoU | 0.8562 |
| Mask F1 Score | 0.9223 |
| Mask PA | 0.988 |
The following section demonstrates the transition from a distorted, low-contrast original to a rectified, binarized output.
- (Recommended: Comparison of Original -> Dewarped -> Forensic -> Binarized)
We evaluate the quality of the restoration by measuring the accuracy of text extraction using two primary metrics: Character Error Rate (CER) and Word Error Rate (WER).
| Processing Stage | CER (%) | WER (%) |
|---|---|---|
| Original Scan | 247 | 130 |
| Post-Restoration | 110 | 95 |
src/core/deskewer.py: Implementation of Probabilistic Hough Transform.src/core/dewarp.py: Logic for Skeletonization, Curve Fitting, and TPS.src/core/forensic.py: Division Normalization and ZCA modules.src/core/ai_model.py: Architecture for the Mask Detection model.src/utils/metrics.py: Calculation tools for CER and WER.
By constraining Deep Learning to structural detection and relying on classical mathematics for image transformation, this project provides a robust, low-cost, and transparent solution for document restoration. This hybrid approach ensures that the historical "truth" of the document is preserved without the artifacts typically introduced by purely generative AI.
- Developed by:
- Nguyen Minh Quang - University of Science, VNU. https://github.com/minhquang0407
- Dinh Nhat Tan - University of Science, VNU. https://github.com/Hecquyn175
- Nguyen Quoc Anh Quan - University of Science, VNU. https://github.com/nqaq2005
- Le Nguyen Bao Thi - University of Science, VNU. https://github.com/Wis2411