Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 80+ languages.
-
Updated
Sep 26, 2025 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 80+ languages.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agent workflows.
MinerU免安装部署一键启动整合包
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
An interactive command-line tool designed to quickly navigate directories and perform various file operations efficiently. Its simple syntax and intuitive commands make it a favorite among developers for streamlining workflow tasks.
Add a description, image, and links to the pdf-extractor-rag topic page so that developers can more easily learn about it.
To associate your repository with the pdf-extractor-rag topic, visit your repo's landing page and select "manage topics."