Multi-stage deep learning pipeline that parses scanned handwritten calendars into structured data — segmenting regions, classifying day/month cells, and recognizing handwritten annotations.
-
Updated
May 7, 2026 - Python
Multi-stage deep learning pipeline that parses scanned handwritten calendars into structured data — segmenting regions, classifying day/month cells, and recognizing handwritten annotations.
OCR/HTR web service for digitizing Moscow church records (18-19th century). Extracts structured data (names, dates, addresses) from historical documents with NER and WER quality metrics. Hackathon project for Moscow Main Archive.
AI-powered document scanner that automatically detects, corrects perspective, and enhances scanned documents from photos using OpenCV
"Interactive OCR system with real-time correction using Tesseract and confidence-based filtering."
Add a description, image, and links to the document-digitization topic page so that developers can more easily learn about it.
To associate your repository with the document-digitization topic, visit your repo's landing page and select "manage topics."