Open
Description
If the URL contains UNICODE encoding, python will report an error.
debug info:
INFO:root:Crawling #1: https://gvo.wiki/html/NPC掉落書籍.html
DEBUG:root:https://gvo.wiki/html/NPC掉落書籍.html ==> 'ascii' codec can't encode characters in position 13-16: ordinal no
t in range(128)
Solution:
- edit crawler.py
Add the following code at the top
import string
from urllib.parse import unquote
-
then search
current_url = self.urls_to_crawl.pop()
-
add a line below
current_url = self.urls_to_crawl.pop()
current_url = quote(current_url, safe=string.printable)
Metadata
Metadata
Assignees
Labels
No labels