Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
Oct 10, 2024 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
✅ Parse your browser's exported HTML bookmark file to Markdown.
Python, Javascript, and Rust libraries for the Spider Cloud API.
a cli tool to fetch webpages main content and print it as markdown
A simplified online encyclopedia with Markdown-formatted entries. Powered by Django.
website scraper for text with conversion to markdown.md and directory structuring
Outillage d'extraction du contenu de l'ancien site de Geotribu (web scraping, conversion en markdown...)
Let's do web scrapping from codewars and bring all the solution codes along with their README at once
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."