Skip to content

w3lib.url.safe_url_string incorrectly encode IDNA domain with port #171

Closed
@heroesm

Description

@heroesm

Step to reproduce:

>>> from w3lib.url import safe_url_string
>>> safe_url_string('http://新华网.中国')
'http://xn--xkrr14bows.xn--fiqs8s'
>>> safe_url_string('http://新华网.中国:80')
'http://xn--xkrr14bows.xn--:80-u68dy61b'

safe_url_string('http://新华网.中国:80')
expected result:

'http://xn--xkrr14bows.xn--fiqs8s:80'

real result:

'http://xn--xkrr14bows.xn--:80-u68dy61b'

Related code:

netloc = parts.netloc.encode('idna')

netloc = parts.netloc.encode('idna')

Maybe IDNA encoding should be done on hostname rather than netloc.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions