trafilatura-1.11.0
Breaking change:
- metadata now skipped by default (#613), to trigger inclusion in all output formats:
with_metadata=True
(Python)--with-metadata
(CLI)
Extraction:
- add HTML as output format (#614)
- better and faster baseline extraction (#619)
- better handling of HTML/XML elements (#628)
- XPath rules added with @felipehertzer (#540)
- fix: avoid faulty readability_lxml content (#635)
Evaluation:
- new scripts and data with @LydiaKoerber (#606, #615)
- additional data with @swetepete (#197)
Maintenance: