System Architecture

Click to enlarge • Scroll to zoom • Drag to pan • Double-click to reset
Project Overview
Description
Production-ready OCR pipeline combining CRAFT (Character Region Awareness For Text) detection with CRNN (Convolutional Recurrent Neural Network) recognition. The system processes natural scene images to extract text through a multi-stage approach: VGG16-BN backbone feature extraction, U-Net style decoder with skip connections for character-level region and affinity map generation, spatial proximity-based word grouping with fixed padding, coordinate scaling and reading order sorting, and individual word cropping for recognition.
Key Features
✓
VGG16-BN backbone with U-Net decoder architecture featuring skip connections at multiple scales✓
Dual output heads: region score map for character detection and affinity score map for character grouping✓
Configurable fixed-padding bounding box expansion (3px horizontal, 4px vertical) for robust word capture✓
Multi-scale coordinate mapping with precise scaling factors to preserve spatial accuracy✓
Reading-order sorting: vertical clustering (20px threshold) followed by horizontal sorting within lines✓
CRNN recognition with 7-layer CNN feature extractor + bidirectional LSTM sequence modeling✓
CTC (Connectionist Temporal Classification) decoding with greedy algorithm and duplicate removal✓
Intelligent confidence thresholding (0.9) with EasyOCR fallback for low-confidence predictions✓
Automated result generation: annotated images, coordinate files, binary masks, and HTML visualization tables✓
Pretrained weight loading with Synth90k dataset initialization and custom alphabet encodingTechnology Stack
PyTorchCRAFTCRNNOpenCVVGG16-BNBi-LSTMCTC LossEasyOCRPythonNumPy
Project Links
Project Gallery

Click to enlarge

✓

1 / 2
© 2025 Charlotte Burns