Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientResponseError on header values with non-printable characters #5355

Closed
smola opened this issue Dec 17, 2020 · 12 comments
Closed

ClientResponseError on header values with non-printable characters #5355

smola opened this issue Dec 17, 2020 · 12 comments
Labels

Comments

@smola
Copy link

smola commented Dec 17, 2020

🐞 Describe the bug

When a server replies with a header value including non-printable characters, ClientResponseErroris thrown with the message invalid character in header. This is an issue in the http-parser library.

💡 To Reproduce

import aiohttp
async with aiohttp.ClientSession() as sess:
    await sess.get('https://www.dezeen.com/')

💡 Expected behavior

Other HTTP clients work just fine with these responses (e.g. CPython, curl, web browsers).

📋 Logs/tracebacks

See above.

📋 Your version of the Python

$ python --version
Python 3.8.5

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /venv/lib/python3.8/site-packages
Requires: chardet, multidict, attrs, async-timeout, yarl, typing-extensions
Required-by: mypkg
$ python -m pip show multidict
Name: multidict
Version: 5.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /venv/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.3
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /venv/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

This may be a duplicate of #5269. It's hard to tell given the context given on that issue.

@smola smola added the bug label Dec 17, 2020
@Dreamsorcerer
Copy link
Member

Your example works when I try it.

@Dreamsorcerer
Copy link
Member

test.py:

import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as sess:
        print("A")
        r = await sess.get('https://www.dezeen.com/')
        print("B")
        print(await r.text())

asyncio.run(main())
> python3 test.py
A
B
<!DOCTYPE html><html lang=en><meta charset=utf-8><head><meta name=robots content=NOODP><meta property=fb:pages content=101882448673><meta name=pinterest-rich-pin content=false><meta name=viewport content="width=device-width, initial-scale = 1.0"><meta http-equiv=Content-Security-Policy content=upgrade-insecure-requests> <script async src=https://www.googletagservices.com/tag/js/gpt.js></script> <script>var googletag = googletag || {};
...

@smola
Copy link
Author

smola commented Dec 17, 2020

@Dreamsorcerer Maybe that server is not always generating bad responses. Here's a test case with a local server:

import aiohttp
from aiohttp import web
import asyncio
from multiprocessing import Process
import time
from urllib.request import urlopen

PORT = 10001

async def handle_req(request):
    return aiohttp.web.Response(text="OK", headers={'My-Header': 'foo\1bar'})

async def main():
    async with aiohttp.ClientSession() as sess:
        r = await sess.get(f"http://127.0.0.1:{PORT}/")
        print(await r.text())

def run_server():
    app = web.Application()
    app.add_routes([web.get('/', handle_req)])
    web.run_app(app, host="127.0.0.1", port=PORT)

p = Process(target=run_server)
p.start()

time.sleep(2)

print('urlopen:', urlopen(f"http://127.0.0.1:{PORT}/").headers.items())

asyncio.run(main())

p.join()

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Dec 18, 2020

Thanks. Seems the exception is raised from this cython code:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/_http_parser.pyx#L546
So, I'm assuming the error actually occurs in cparser.http_parser_execute() a few lines earlier.

I'm not at all familiar with that code, I can't even figure out where the code for cparser lives, but hopefully someone else will be able to pick it up from that.

@smola
Copy link
Author

smola commented Dec 18, 2020

@derlih
Copy link
Contributor

derlih commented Dec 22, 2020

@smola looks like you are trying to use non ASCII char in the header.
Take a look on this answer.
I also tried your code with llhttp from #5364, but it raises the same error.

aiohttp.http_exceptions.BadHttpMessage: 400, message='Invalid header value char'

@smola
Copy link
Author

smola commented Dec 23, 2020

@derlih I don't control the server, I'm scraping the web. There are servers in the wild that return non-printable characters in headers. Even if this is not standard, other HTTP implementations (e.g. CPython, curl, web browsers) handle this case gracefully.

@derlih
Copy link
Contributor

derlih commented Dec 23, 2020

@smola
I've just tried to disable a binary parser and to use a pure python one.
AIOHTTP_NO_EXTENSIONS=1 python ./test_5355.py
And it worked. Looks like python implementation is not so strict.

@DurgNomis-drol
Copy link

I also have this problem, and @derlih answer worked. In my case i try to fetch information from Toyota Connected Services Endpoints and in requests it works just fine. But with aiohttp it returns 400 invalid char in header. But when i run my script with AIOHTTP_NO_EXTENSIONS=1 python3 test.py it works.

What are the plans to fix this, if it can be fixed?

@yotamgod
Copy link

Another example here:
#5730

@g66-gopanear
Copy link

I am getting the same issue when I do the get request using below as a header.
This is my header : {'X-Avangate-Authentication': 'code="255435546325215709" date="2021-06-11 09:14:04" hash="f312b75343er41f44dce26a3d4fdb6dc3e5a0"'}

@asvetlov
Copy link
Member

the parser is replaced with llhttp.
If the new parser have issues -- it should be reported separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants