Build software better, together

adrienjoly / npm-pdfreader

Star

🚜 Parse text and tables from PDF files.

javascript parsing tabular-data pdf-converter data-extraction pdf-reader parse-tables rule-based-parsing

Updated Jan 21, 2026
HTML

mrshu / github-statuses

Star

The "Missing GitHub Status Page" -- a Flat Data attempt at historically documenting GitHub statuses

github status open-data uptime data-extraction ner flat-data status-page

Updated Jun 14, 2026
HTML

N4rr34n6 / TelegramBackup

Star

TelegramBackup is a sophisticated tool designed for extracting, organizing, and archiving messages from your Telegram chats, channels, and groups.

python open-source telegram data-extraction media-download automation-script telegram-backup channel-backup message-archiving supergroup-backup entity-processing chat-history-export telegram-data-analysis html-report-generation group-chat-backup media-files-download

Updated May 9, 2026
HTML

ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.

image-processing data-extraction image-metadata hacktoberfest information-gathering vivek powered-by-aryan-technologies aryan-technologies images-hacking aryanshop aryanvbw vivek-w vivek-wagdare

Updated Oct 12, 2025
HTML

aborruso / scrape-cli

Star

Extract HTML elements from the command line using CSS selectors or XPath. Pipe-friendly Python CLI.

python html cli scraping web-scraping xpath data-extraction command-line-tool lxml css-selectors

Updated Apr 30, 2026
HTML

ohsusannamarie / bookmarklet-os

Star

Browser productivity operating system with 150+ bookmarklets, AI-assisted discovery, collections, workflow modes, visual maps, and team sharing.

Updated Jun 14, 2026
HTML

R-Pradhyumna / VTU-Diary-Generator

Star

Automates VTU internship diary workflow by extracting entries, normalizing data, and generating structured PDF reports directly in the browser.

javascript productivity web-scraping data-extraction html-css vtu pdf-generator browser-automation json-processing automation-tools no-backend student-tools internship-diary

Updated Apr 20, 2026
HTML

maitreyeepaliwal / Alleropedia-Database-for-Allergens

Star

Metadetabase of 13145 records generated for Allergens with a tabular view of the data. Web interface connected to ease the use, analysis and extraction of data with several added functionalities. Tutorial section added to educate the users of the interface design and features and the database.

bioinformatics biology data-extraction bioinformatics-data allergies bioinformatics-databases allergy database-generator bioinfo metadatabase allergic-diseases biological-database biological-databases database-generation-for-allergens allergen-database secondary-database biology-project alleropedia

Updated Jun 5, 2021
HTML

AyushParkara / Zer0Snatch

Star

Zer0Snatch – A lightweight, zero-dependency tool to securely extract and archive target data from digital sources. Designed for OSINT, automation, and ethical security research.

python security automation osint cybersecurity data-extraction ethical-hacking information-gathering password-sniffer security-tools cli-tool ethical-hacking-tools zer0snatch

Updated Jun 3, 2025
HTML

ermiasgelaye / ETL-Project

Star

In this project, we built a database that demonstrates the changes in American top fastest-growing private companies through time. The database is built on by ingesting, combining, and restructuring data from three main data sources into a conformed one Postgresql database, and deploy into the Flask app.

python api postgres data-science etl pandas-dataframe extract scraping postgresql pandas flask-application data-extraction load transformation scraping-websites flask-sqlalchemy production-database

Updated Aug 17, 2020
HTML

Shreesh8 / Data-Extractor

Star

A lightweight data extraction tool that collects, processes, and structures information from web sources for analysis and automation.

javascript python html automation web-scraping data-extraction

Updated Aug 14, 2025
HTML

Vedant1202 / agentpack

Star

An offline document-to-agent-context compiler. Transform unstructured files (PDFs, CSVs, Markdown) into token-efficient, semantic context packs for LLM agents.

python developer-tools data-extraction semantic-search ai-agents pdf-parser cli-tool pymupdf rag vector-search ai-engineering llm document-parsing context-window fastembed context-optimization token-optimization

Updated Jun 7, 2026
HTML

adi2355 / MCP-Server-Collection

Star

Collection of purpose-built MCP servers for AI agent workflows.

python typescript mcp web-scraping data-extraction jsonpath ai-agents structured-extraction llm deepseek firecrawl model-context-protocol mcp-server codebase-analysis agent-workflows

Updated Apr 7, 2026
HTML

Anwarsha7 / resumeparser

Star

An intelligent resume parsing engine built with Python and NLP, aimed at automating the tedious task of sifting through resumes. It accurately extracts vital candidate information such as contact details, employment history, educational qualifications, and technical skills, making it an invaluable asset for recruitment and HR professionals.

python natural-language-processing text-mining information-extraction data-extraction recruitment resume-parser npl resume-analysis hr-management hr-tech parsing-data document-parsing candidate-screening

Updated Jun 2, 2025
HTML

cable8mm / mma-scrapers

Star

This is the statistics scrappers for MMA

php crawler scraper web-scraping data-extraction dom-crawler ufc mma sports-data sherdog black-combat

Updated Jun 11, 2026
HTML

RudraTyagi1135 / dynamic-web-scraper

Star

infinite-scroll data-extraction selenium-webdriver browser-automation dynamic-web-scraping

Updated May 24, 2026
HTML

hubby32 / __2025_11_01_tvdi_python_crawel__

Star

🌐 Discover web scraping techniques with Python in the 職能發展學院 2025 course for efficient data gathering and analysis.

python api crawler data-mining automation programming spider backend project requests web-scraping data-extraction software-development html-parsing beautifulsoup

Updated Jun 14, 2026
HTML

SamadhanSonwane / LinkedIn-Activity-Stats

Star

A Selenium WebDriver project that reads all article and post analytics, and stores it in an MS Excel file.

java automation selenium selenium-java data-extraction selenium-webdriver testng data-extractor automated-testing linkedin-signin apache-poi

Updated Mar 25, 2018
HTML

gUBII / importcsv

Star

Selenium toolkit (TurnpointPurger) that extracts client and worker data from TurnPoint, archives it under a sequential NexisID scheme, and converts exports into Nexis-ready payloads via a Tkinter GUI or CLI.

python automation selenium data-extraction ndis

Updated Mar 25, 2026
HTML

fadh24434 / webarsenal

Star

Build web scraping, mirroring, and data extraction tools for full-stack web intelligence and monitoring with 110 modules

nodejs automation cheerio proxy web-crawler web-scraping developer-tools data-extraction puppeteer playwright

Updated Jun 14, 2026
HTML

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-extraction

Here are 71 public repositories matching this topic...

adrienjoly / npm-pdfreader

mrshu / github-statuses

N4rr34n6 / TelegramBackup

AryanVBW / Exif

aborruso / scrape-cli

ohsusannamarie / bookmarklet-os

R-Pradhyumna / VTU-Diary-Generator

maitreyeepaliwal / Alleropedia-Database-for-Allergens

AyushParkara / Zer0Snatch

ermiasgelaye / ETL-Project

Shreesh8 / Data-Extractor

Vedant1202 / agentpack

adi2355 / MCP-Server-Collection

Anwarsha7 / resumeparser

cable8mm / mma-scrapers

RudraTyagi1135 / dynamic-web-scraper

hubby32 / __2025_11_01_tvdi_python_crawel__

SamadhanSonwane / LinkedIn-Activity-Stats

gUBII / importcsv

fadh24434 / webarsenal

Improve this page

Add this topic to your repo