🎯
Focusing
- Beijing, China
Stars
Web Crawling
6 repositories
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
A standalone version of the readability lib
Python version of the Playwright testing and automation library.