Heuristic based boilerplate removal tool
-
Updated
Feb 25, 2025 - Python
Heuristic based boilerplate removal tool
Locally saves webpages to your hard disk with images, css, js & links as is.
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.
Easy way for HTML parsing and building XPath
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
This project allows you to convert your YouTube watch history HTML file from Google Takeout into a CSV file that can be used by the universalscrobbler.com to Scrobble manually in bulk.
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
✅ Parse your browser's exported HTML bookmark file to Markdown.
A Python library for loading data from various formats into PostgreSQL databases.
A Work-In-Progress Discord bot based on the largely popular Touhou series by ZUN.
A script to parse the saved Humble Bundle library HTML
this script can analyze number of telegram messages by time
Lightweight HTML/XML parser for quick and dirty web scraping.
Python webscraping module for NCAA Basketball Stats
Сбор данных из реестра российского ПО с сайта https://reestr.minsvyaz.ru
Multipage Streamlit app that brings together several html data extraction tools.
Python Script to extract college names from UGC, India website.
Python scraper for TotalWine.com data 🍷
Add a description, image, and links to the html-parser topic page so that developers can more easily learn about it.
To associate your repository with the html-parser topic, visit your repo's landing page and select "manage topics."