Skip to content

Conversation

@parvit
Copy link

@parvit parvit commented Jul 25, 2022

This change responds to issue #243.

Requires openzim/python-scraperlib#88 to work correctly.

Changes the default behavior to not require compression and full-text indexing by default which can be an issue with big sites.

Introduces two new commandline flags to support enabling the features back:

  • --make_fulltext_index : boolean default false, activates the fulltext indexing of xapian
  • --compression : string which corresponds to the requested zim compression algorithm (eg. lzma)

@kelson42 kelson42 requested a review from rgaudin July 25, 2022 16:22
@kelson42
Copy link
Contributor

@parvit Text compression and ft indexing ahoukd be activated per default. This is what users expect and a "standard" in all our scraper. I'm not informed about problems by doing so.

@rgaudin
Copy link
Member

rgaudin commented Jul 25, 2022

Let's verify first #243 (comment) that disabling them would significantly improve the MEM situation. In such a case, we may introduce the opposite option (disable) as a temporary measure until we fix the root cause as both are definitely wanted features.

@parvit
Copy link
Author

parvit commented Jul 25, 2022

@kelson42 Sure if you want you can check the data i've provided in issue 243 that indicate that those two features can create memory usage problems (in the scenario of big sites) and at least allowing to disable them should be considered.

@parvit
Copy link
Author

parvit commented Jul 26, 2022

Seen that the other PR was closed than this too has no reason to be left open.

@parvit parvit closed this Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants