A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
-
Updated
Jul 11, 2025 - Python
A parser turns its input (often text in form of a file) into a more advantageous representation (usually a certain data structure in memory) to perform a specific task.
Common examples include:
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
A python module to repair invalid JSON from LLMs
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
Type-safe YAML parser and validator.
Portable KMS (knowledge management system) designed to integrate seamlessly with any Retrieval-Augmented Generation (RAG) system
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
Wiktionary dump file parser and multilingual data extractor
A simple resume parser used for extracting information from resumes