Menu

System Architecture

Architecture Diagram 1
Click to enlarge • Scroll to zoom • Drag to pan • Double-click to reset

Project Overview

Description

Production-ready OCR pipeline combining CRAFT (Character Region Awareness For Text) detection with CRNN (Convolutional Recurrent Neural Network) recognition. The system processes natural scene images to extract text through a multi-stage approach: VGG16-BN backbone feature extraction, U-Net style decoder with skip connections for character-level region and affinity map generation, spatial proximity-based word grouping with fixed padding, coordinate scaling and reading order sorting, and individual word cropping for recognition.

Key Features

VGG16-BN backbone with U-Net decoder architecture featuring skip connections at multiple scales
Dual output heads: region score map for character detection and affinity score map for character grouping
Configurable fixed-padding bounding box expansion (3px horizontal, 4px vertical) for robust word capture
Multi-scale coordinate mapping with precise scaling factors to preserve spatial accuracy
Reading-order sorting: vertical clustering (20px threshold) followed by horizontal sorting within lines
CRNN recognition with 7-layer CNN feature extractor + bidirectional LSTM sequence modeling
CTC (Connectionist Temporal Classification) decoding with greedy algorithm and duplicate removal
Intelligent confidence thresholding (0.9) with EasyOCR fallback for low-confidence predictions
Automated result generation: annotated images, coordinate files, binary masks, and HTML visualization tables
Pretrained weight loading with Synth90k dataset initialization and custom alphabet encoding

Technology Stack

PyTorchCRAFTCRNNOpenCVVGG16-BNBi-LSTMCTC LossEasyOCRPythonNumPy

Project Links

Project Gallery

Text Detection
Click to enlarge
Text Detection
Text Extraction
1 / 2
© 2025 Charlotte Burns