ocr

Here are 3,032 public repositories matching this topic...

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

ocr pdf-parser kie document-translation rag chineseocr ai4science pp-ocr document-parsing pp-structure pdf-extractor-rag pdf2markdown paddleocr-vl

Updated Dec 16, 2025
Python

opendatalab / MinerU

Star

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

python pdf parser ocr pdf-converter extract-data document-analysis pdf-parser layout-analysis ai4science pdf-extractor-rag pdf-extractor-llm pdf-extractor-pretrain

Updated Dec 16, 2025
Python

hiroi-sora / Umi-OCR

Star

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

screenshot qt ocr qml ocr-python paddleocr umi-ocr

Updated Nov 20, 2025
Python

paperless-ngx / paperless-ngx

Star

A community-supported supercharged document management system: scan, index and archive all your documents

pdf machine-learning django angular ocr archiving dms document-management optical-character-recognition hacktoberfest document-management-system

Updated Dec 16, 2025
Python

ocrmypdf / OCRmyPDF

Star

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

python pdf ocr image-processing tesseract

Updated Dec 15, 2025
Python

JaidedAI / EasyOCR

Star

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

python machine-learning information-retrieval data-mining ocr deep-learning image-processing cnn pytorch lstm optical-character-recognition crnn scene-text scene-text-recognition easyocr

Updated Dec 5, 2025
Python

lukas-blecher / LaTeX-OCR

Star

pix2tex: Using a ViT to convert images of equations into LaTeX code.

python machine-learning ocr latex deep-learning image-processing pytorch dataset transformer vit image2text im2text im2latex im2markup math-ocr vision-transformer latex-ocr

Updated Jan 18, 2025
Python

sml2h3 / ddddocr

Star

带带弟弟通用验证码识别OCR pypi版

ocr captcha ddddocr

Updated Jun 9, 2025
Python

zyddnys / manga-image-translator

Star

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

ocr deep-learning neural-network anime machine-translation manga image-processing transformer chinese-translation text-detection auto-translation inpainting text-detection-recognition pytorch-implementation japanese-translations

Updated Dec 1, 2025
Python

pymupdf / PyMuPDF

Star

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

python pdf font data-science ocr tesseract epub mupdf text-processing pdf-documents extract-data table-extraction text-shaping xps pymupdf

Updated Dec 14, 2025
Python

YaoFANGUK / video-subtitle-extractor

Sponsor

Star

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

ocr deep-learning extract ripper subtitles srt subrip hardsub