🚜 Parse text and tables from PDF files.
-
Updated
Jan 21, 2026 - HTML
🚜 Parse text and tables from PDF files.
The "Missing GitHub Status Page" -- a Flat Data attempt at historically documenting GitHub statuses
TelegramBackup is a sophisticated tool designed for extracting, organizing, and archiving messages from your Telegram chats, channels, and groups.
ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.
Extract HTML elements from the command line using CSS selectors or XPath. Pipe-friendly Python CLI.
Browser productivity operating system with 150+ bookmarklets, AI-assisted discovery, collections, workflow modes, visual maps, and team sharing.
Automates VTU internship diary workflow by extracting entries, normalizing data, and generating structured PDF reports directly in the browser.
Metadetabase of 13145 records generated for Allergens with a tabular view of the data. Web interface connected to ease the use, analysis and extraction of data with several added functionalities. Tutorial section added to educate the users of the interface design and features and the database.
Zer0Snatch – A lightweight, zero-dependency tool to securely extract and archive target data from digital sources. Designed for OSINT, automation, and ethical security research.
In this project, we built a database that demonstrates the changes in American top fastest-growing private companies through time. The database is built on by ingesting, combining, and restructuring data from three main data sources into a conformed one Postgresql database, and deploy into the Flask app.
A lightweight data extraction tool that collects, processes, and structures information from web sources for analysis and automation.
An offline document-to-agent-context compiler. Transform unstructured files (PDFs, CSVs, Markdown) into token-efficient, semantic context packs for LLM agents.
Collection of purpose-built MCP servers for AI agent workflows.
An intelligent resume parsing engine built with Python and NLP, aimed at automating the tedious task of sifting through resumes. It accurately extracts vital candidate information such as contact details, employment history, educational qualifications, and technical skills, making it an invaluable asset for recruitment and HR professionals.
This is the statistics scrappers for MMA
🌐 Discover web scraping techniques with Python in the 職能發展學院 2025 course for efficient data gathering and analysis.
A Selenium WebDriver project that reads all article and post analytics, and stores it in an MS Excel file.
Selenium toolkit (TurnpointPurger) that extracts client and worker data from TurnPoint, archives it under a sequential NexisID scheme, and converts exports into Nexis-ready payloads via a Tkinter GUI or CLI.
Build web scraping, mirroring, and data extraction tools for full-stack web intelligence and monitoring with 110 modules
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."