Get your documents ready for gen AI
-
Updated
May 2, 2025 - Python
Get your documents ready for gen AI
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Open source Python library for converting PDF to DOCX.
A text extraction library supporting PDFs, images, office documents and more
Enjoy reading with your favorite style.
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.
Python tool and library for decrypting and encrypting MS Office files using passwords or other keys
📚 Process PDFs, Word documents and more with spaCy
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Telegram Bot that helps you to convert Images to pdf, pdf to images, 45+ file formats to pdf, more features Soon..
Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...
ContextGem: Effortless LLM extraction from documents
python: selenium + sqlite3 爬虫,实现将淘宝网站数据、1688网站数据的爬取,淘宝爬虫\1688爬虫;并保存到数据库中
Extracts tables from .docx files and saves them as .csv or .xls files
Docx tracked change redlines for the Python ecosystem.
Add a description, image, and links to the docx topic page so that developers can more easily learn about it.
To associate your repository with the docx topic, visit your repo's landing page and select "manage topics."