Skip to content

URL UnicodeEncodeError #79

Open
Open
@wkingnet

Description

@wkingnet

If the URL contains UNICODE encoding, python will report an error.

debug info:

INFO:root:Crawling #1: https://gvo.wiki/html/NPC掉落書籍.html
DEBUG:root:https://gvo.wiki/html/NPC掉落書籍.html ==> 'ascii' codec can't encode characters in position 13-16: ordinal no
t in range(128)

Solution:

  1. edit crawler.py
    Add the following code at the top
import string
from urllib.parse import unquote
  1. then search
    current_url = self.urls_to_crawl.pop()

  2. add a line below

current_url = self.urls_to_crawl.pop()
current_url = quote(current_url, safe=string.printable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions