🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
-
Updated
Oct 23, 2025 - TypeScript
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Python scraper based on AI
⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Extract Keywords from sentence or Replace keywords in sentences.
ContextGem: Effortless LLM extraction from documents
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
Lightweight library for scraping web-sites with LLMs
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.
📰 Let ChatGPT Summarize Hacker News for You
Built for enterprise-scale agentic AI — with open deployment, zero lock-in, and full explainability. Run it anywhere: local, cloud, or bare metal. Own your data. Trust your insights.
🚜 Parse text and tables from PDF files.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
Benchmarking PDF libraries
Undetected web-scraping & seamless HTML parsing in Python!
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."