Skip to content

Cannot download urls with Cyrillic letters and https protocol. #949

@amorgun

Description

@amorgun

My script:

import aiohttp
import asyncio


async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, url))
        print(html)

It fails with the following error:

Exception in callback None
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 671, in _read_ready
    self._protocol.data_received(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 492, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.5/asyncio/sslproto.py", line 200, in feed_ssldata
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 633, in do_handshake
    match_hostname(self.getpeercert(), self.server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 296, in match_hostname
    % (hostname, ', '.join(map(repr, dnsnames))))
ssl.CertificateError: hostname 'цфоут.мвд.рф' doesn't match either of '*.xn--b1aew.xn--p1ai', 'xn--b1aew.xn--p1ai'

Interestingly enough, string 'цфоут.мвд.рф' actually matches '*.xn--b1aew.xn--p1ai':

>>> 'цфоут.мвд.рф'.encode('idna').decode('utf8').endswith('.xn--b1aew.xn--p1ai')
True

Same script with requests:

# This works fine
import requests

if __name__ == '__main__':
    url = u'https://цфоут.мвд.рф/news/item/8065038/'
    print(requests.get(url).text)

Versions

$ python -V
Python 3.5.1
$ pip3 freeze
aiohttp==0.22.0a0
chardet==2.3.0
multidict==1.0.3
requests==2.10.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:    14.04
Codename:   trusty

What I think

I found similar question on SO, but setting verify_ssl=False looks like a pretty dangerous hack to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions