Charlotte | Portfolio

System Architecture

Click to enlarge • Scroll to zoom • Drag to pan • Double-click to reset

Project Overview

Description

Production-ready OCR pipeline combining CRAFT (Character Region Awareness For Text) detection with CRNN (Convolutional Recurrent Neural Network) recognition. The system processes natural scene images to extract text through a multi-stage approach: VGG16-BN backbone feature extraction, U-Net style decoder with skip connections for character-level region and affinity map generation, spatial proximity-based word grouping with fixed padding, coordinate scaling and reading order sorting, and individual word cropping for recognition.

Key Features

✓

VGG16-BN backbone with U-Net decoder architecture featuring skip connections at multiple scales

✓

Dual output heads: region score map for character detection and affinity score map for character grouping

✓

Configurable fixed-padding bounding box expansion (3px horizontal, 4px vertical) for robust word capture

✓

Multi-scale coordinate mapping with precise scaling factors to preserve spatial accuracy

✓

Reading-order sorting: vertical clustering (20px threshold) followed by horizontal sorting within lines

✓

CRNN recognition with 7-layer CNN feature extractor + bidirectional LSTM sequence modeling

✓

CTC (Connectionist Temporal Classification) decoding with greedy algorithm and duplicate removal

✓

Intelligent confidence thresholding (0.9) with EasyOCR fallback for low-confidence predictions

✓

Automated result generation: annotated images, coordinate files, binary masks, and HTML visualization tables

✓

Pretrained weight loading with Synth90k dataset initialization and custom alphabet encoding

Technology Stack

PyTorchCRAFTCRNNOpenCVVGG16-BNBi-LSTMCTC LossEasyOCRPythonNumPy

Project Links

View Code

Project Gallery

Click to enlarge

✓

1 / 2