-
-
Notifications
You must be signed in to change notification settings - Fork 7
Use pylibzim to create ZIM #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a422b33 to
334656e
Compare
5a0d951 to
d0c9880
Compare
d0c9880 to
b5778cd
Compare
using `path_fixed` and `fixed_path` for different stuff in same method is bad. I've renamed those. Also moved `relative_dots` and `update_root_relative_path` into `rewrite_internal_links`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some simplifications so it's easier to read
using path_fixed and fixed_path for different stuff in same method is bad.
I've renamed those.
Also moved relative_dots and update_root_relative_path into rewrite_internal_links.
Codefactor may complain about complexity (my editor didn't but config is different).
In this case we'd moved those methods back
a5ed85f to
bdc4c22
Compare
Codefactor doesn't complain so kept it there. Also, the change broke it as the links starting with path_prefix were never fixed, as the absolute links were fixed before. So fixed that. Also made another module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
This fixes #53 and uses pylibzim to create ZIMs. It currently relies on openzim/python-scraperlib#34 and thus has a requirement from that branch itself. Also fixes #24 which was necessary to make pylibzim work.
Openedx instances have many root-relative links and we correctly fix them to be not root relative but just relative if the page that it points is present in the ZIM or else point to an external URL by adding the instance netloc.
The following changes are made in scraper.py related to link rewriting -
get_course_tabs()which only gets the course tabs and the new annex() method actually downloads the content.get_course_tabs()is reused inrewrite_internal_links().get_course_tabs()as we do not offline all tabs).handle_jump_to_path()compares the jump_to type URL and finds the xblock with that URL from the list of xblock_extractor objects, and checks if the xblock is a vertical or course and returns the modified link. As only course and vertical have HTMLs, we look at the descendants for linkable xblocks too here.relative_dots()prepares a path of backward jumps, according to the number of parts in the pathupdate_root_relative_path()writes ensures that no root relative URLs are left out by putting theinstance_urlin place of the netloc.rewrite_internal_links()is the main manager method. It calls the other functions. In case of jump_to links, if in the first try we do not get a path, we try with the parent as it may be pointing to an xblock with which the vertical xblock is made.Note this depends on a future release of zimscraperlib