Skip to content

Add utility to index PDF documents content #167

@benoit74

Description

@benoit74

Content of PDF documents is not indexed for full text search, while on some ZIM it is the "core" of the ZIM.

Extracting PDF info would be beneficial to many scrapers and should thus ideally be exposed in scraperlib.

See e.g. openzim/warc2zim#289

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions