New scrapers and readability fallback extractor

Latest

pmyteh released this 08 Aug 16:45

· 42 commits to master since this release

v1.1.0

2b7b1aa

The highlights of this release are the addition of a number of new news source scrapers, and particularly the addition of a fallback headline and text extractor based on the readability language. This means that, even if the site being scraped serves up an unknown page format, we can attempt to extract the body text and headline using heuristics.

A number of bug fixes are also included.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New scrapers and readability fallback extractor