Module for automatic summarization of text documents and HTML pages.
-
Updated
Aug 11, 2025 - Python
Module for automatic summarization of text documents and HTML pages.
Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.
Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version
Article extraction benchmark: dataset and evaluation scripts
Extract embedded metadata from HTML markup
fast python port of arc90's readability tool, updated to match latest readability.js!
Heuristic based boilerplate removal tool
Extract price amount and currency symbol from a raw text string
Parse numbers written in natural language
Add a description, image, and links to the html-extraction topic page so that developers can more easily learn about it.
To associate your repository with the html-extraction topic, visit your repo's landing page and select "manage topics."