Add utility to index PDF documents content

Content of PDF documents is not indexed for full text search, while on some ZIM it is the "core" of the ZIM.

Extracting PDF info would be beneficial to many scrapers and should thus ideally be exposed in scraperlib.
 
See e.g. https://github.com/openzim/warc2zim/issues/289