Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File uploads without a content type are treated as ordinary fields #4089

Closed
sjaensch opened this issue Sep 24, 2019 · 9 comments · Fixed by #5072
Closed

File uploads without a content type are treated as ordinary fields #4089

sjaensch opened this issue Sep 24, 2019 · 9 comments · Fixed by #5072
Labels
pr-rejected regression Something that used to work stopped working "as before" after upgrade server

Comments

@sjaensch
Copy link
Contributor

sjaensch commented Sep 24, 2019

Long story short

Multipart file uploads do not work correctly since aiohttp 3.6.0 unless a content type is provided. This used to work up to aiohttp 3.5.4 and broke due to #3905, this line specifically.

Expected behaviour

File upload data is treated as a binary data stream if no content type is provided.

Actual behaviour

File upload data is treated as an ordinary text field, possibly causing an UnicodeDecodeError.

Steps to reproduce

  1. Send a request to an aiohttp server with a multipart/form-data payload, where one item is a file without a content type. The requests library will not provide a content type unless you specifically set one.
  2. Call await request.post() in your aiohttp server.

Your environment

This issue is causing integration tests for bravado-asyncio to fail - the build before this one passed, the only change is the upgrade from aiohttp 3.5.4 to aiohttp 3.6.0.

@sjaensch
Copy link
Contributor Author

@kornicameister why did you add the additional check for field_ct to the if condition?

@kornicameister
Copy link
Contributor

kornicameister commented Sep 24, 2019

@sjaensch the reason was that FileField expected content type that is set. It actually made a lot of sense to know what was sent to be able to handle that.

It is actually kind of surprising to hear that you have sent files without a content type. Is that actually RFC, or sth, compliant? If we look here there is a note that part may have a content type that, if not set, defaults to text/plain. Effectively that means that you have sent a .txt file if a filename is present IMHO. Or maybe it means that a string has been sent. All in all, I am inclined to say that first thing is correct which means that my change was not that ok.

I am thinking now that my change could be better if I read attached RFC first.

@sjaensch
Copy link
Contributor Author

@kornicameister I'm not arguing that a content type can be beneficial - the more metadata the better (usually). My point is that it should be valid to send a file without a content type, and we should still treat it as a file. In particular that means it should be treated as a stream of bytes that will not be modified; with your change this is not the case anymore, and in fact it's highly unlikely that file uploads without a content type work at all, since it treats them as utf8 text that it will decode.

@kornicameister
Copy link
Contributor

Well...I changed my response at least 3 times already (phone typing) and I think that I agree with you. The way I would fix that to match RFC is to check if filename is set and there do sth like part_ct or 'text/plain' for FileField case.

@sjaensch
Copy link
Contributor Author

This is the code in the requests library that deals with multipart file uploads. The content type for a file is None unless set explicitly.

Also I don't think we should set a content type we didn't receive. For the user it would be impossible to distinguish between a file we received with content type text/plain and one we received without content type. I don't see any advantage to that.

@webknjaz
Copy link
Member

Each part MAY have an (optional) "Content-Type" header field, which defaults to "text/plain".

This explicitly says that if there's no header, the value is text/plain.

However, this point is also valid:

since it treats them as utf8 text that it will decode.

I think, that maybe we should assume text/plain but not attempt to decode it, keeping bytes as-is. But honestly, it smells like a hack...

Let's also ask @asvetlov.

@webknjaz webknjaz added server regression Something that used to work stopped working "as before" after upgrade labels Sep 25, 2019
@webknjaz
Copy link
Member

Meanwhile, it'd be useful if somebody summarised how other web-server implementations handle this case.

@sjaensch
Copy link
Contributor Author

I think the part about the Content-Type header was written with the non-file cases in mind. It does have a sentence about handing files. Unfortunately it is not written as clearly as it could, making it sound like the client has to send it:

If the contents of a file are to be sent, the file data SHOULD be labeled with an appropriate media type, if known, or "application/octet-stream".

It clearly intends files to be treated as a stream of bytes that is not to be modified, and if the media type is unknown we should assume application/octet-stream.

@kornicameister
Copy link
Contributor

@webknjaz I'd like to point out that it might be that there was a mistake in FileField in the first place. My change was dictated by single simple fact: FileField content-type resolved to a str that cannot be Optional.

Back then, when @asvetlov and me were working on pushing the typing improvement, neither noticed that it might be invalid to assume that lack of content type doesn't mean not dealing with files. As linked RFC pointed out, content-type is totally optional and, like @sjaensch , mentioned for files most likely it defaults to application/octet-stream.

@helpr helpr bot added pr-rejected and removed pr-available labels Oct 16, 2019
netbsd-srcmastr referenced this issue in NetBSD/pkgsrc Oct 24, 2020
This fixes py-yarl in pkgsrc being too new for py-aiohttp.


3.7.0 (2020-10-24)
==================

Features
--------

- Response headers are now prepared prior to running ``on_response_prepare`` hooks, directly before headers are sent to the client.
  `#1958 <https://github.com/aio-libs/aiohttp/issues/1958>`_
- Add a ``quote_cookie`` option to ``CookieJar``, a way to skip quotation wrapping of cookies containing special characters.
  `#2571 <https://github.com/aio-libs/aiohttp/issues/2571>`_
- Call ``AccessLogger.log`` with the current exception available from ``sys.exc_info()``.
  `#3557 <https://github.com/aio-libs/aiohttp/issues/3557>`_
- `web.UrlDispatcher.add_routes` and `web.Application.add_routes` return a list
  of registered `AbstractRoute` instances. `AbstractRouteDef.register` (and all
  subclasses) return a list of registered resources registered resource.
  `#3866 <https://github.com/aio-libs/aiohttp/issues/3866>`_
- Added properties of default ClientSession params to ClientSession class so it is available for introspection
  `#3882 <https://github.com/aio-libs/aiohttp/issues/3882>`_
- Don't cancel web handler on peer disconnection, raise `OSError` on reading/writing instead.
  `#4080 <https://github.com/aio-libs/aiohttp/issues/4080>`_
- Implement BaseRequest.get_extra_info() to access a protocol transports' extra info.
  `#4189 <https://github.com/aio-libs/aiohttp/issues/4189>`_
- Added `ClientSession.timeout` property.
  `#4191 <https://github.com/aio-libs/aiohttp/issues/4191>`_
- allow use of SameSite in cookies.
  `#4224 <https://github.com/aio-libs/aiohttp/issues/4224>`_
- Use ``loop.sendfile()`` instead of custom implementation if available.
  `#4269 <https://github.com/aio-libs/aiohttp/issues/4269>`_
- Apply SO_REUSEADDR to test server's socket.
  `#4393 <https://github.com/aio-libs/aiohttp/issues/4393>`_
- Use .raw_host instead of slower .host in client API
  `#4402 <https://github.com/aio-libs/aiohttp/issues/4402>`_
- Allow configuring the buffer size of input stream by passing ``read_bufsize`` argument.
  `#4453 <https://github.com/aio-libs/aiohttp/issues/4453>`_
- Pass tests on Python 3.8 for Windows.
  `#4513 <https://github.com/aio-libs/aiohttp/issues/4513>`_
- Add `method` and `url` attributes to `TraceRequestChunkSentParams` and `TraceResponseChunkReceivedParams`.
  `#4674 <https://github.com/aio-libs/aiohttp/issues/4674>`_
- Add ClientResponse.ok property for checking status code under 400.
  `#4711 <https://github.com/aio-libs/aiohttp/issues/4711>`_
- Don't ceil timeouts that are smaller than 5 seconds.
  `#4850 <https://github.com/aio-libs/aiohttp/issues/4850>`_
- TCPSite now listens by default on all interfaces instead of just IPv4 when `None` is passed in as the host.
  `#4894 <https://github.com/aio-libs/aiohttp/issues/4894>`_
- Bump ``http_parser`` to 2.9.4
  `#5070 <https://github.com/aio-libs/aiohttp/issues/5070>`_


Bugfixes
--------

- Fix keepalive connections not being closed in time
  `#3296 <https://github.com/aio-libs/aiohttp/issues/3296>`_
- Fix failed websocket handshake leaving connection hanging.
  `#3380 <https://github.com/aio-libs/aiohttp/issues/3380>`_
- Fix tasks cancellation order on exit. The run_app task needs to be cancelled first for cleanup hooks to run with all tasks intact.
  `#3805 <https://github.com/aio-libs/aiohttp/issues/3805>`_
- Don't start heartbeat until _writer is set
  `#4062 <https://github.com/aio-libs/aiohttp/issues/4062>`_
- Fix handling of multipart file uploads without a content type.
  `#4089 <https://github.com/aio-libs/aiohttp/issues/4089>`_
- Preserve view handler function attributes across middlewares
  `#4174 <https://github.com/aio-libs/aiohttp/issues/4174>`_
- Fix the string representation of ``ServerDisconnectedError``.
  `#4175 <https://github.com/aio-libs/aiohttp/issues/4175>`_
- Raising RuntimeError when trying to get encoding from not read body
  `#4214 <https://github.com/aio-libs/aiohttp/issues/4214>`_
- Remove warning messages from noop.
  `#4282 <https://github.com/aio-libs/aiohttp/issues/4282>`_
- Raise ClientPayloadError if FormData re-processed.
  `#4345 <https://github.com/aio-libs/aiohttp/issues/4345>`_
- Fix a warning about unfinished task in ``web_protocol.py``
  `#4408 <https://github.com/aio-libs/aiohttp/issues/4408>`_
- Fixed 'deflate' compression. According to RFC 2616 now.
  `#4506 <https://github.com/aio-libs/aiohttp/issues/4506>`_
- Fixed OverflowError on platforms with 32-bit time_t
  `#4515 <https://github.com/aio-libs/aiohttp/issues/4515>`_
- Fixed request.body_exists returns wrong value for methods without body.
  `#4528 <https://github.com/aio-libs/aiohttp/issues/4528>`_
- Fix connecting to link-local IPv6 addresses.
  `#4554 <https://github.com/aio-libs/aiohttp/issues/4554>`_
- Fix a problem with connection waiters that are never awaited.
  `#4562 <https://github.com/aio-libs/aiohttp/issues/4562>`_
- Always make sure transport is not closing before reuse a connection.

  Reuse a protocol based on keepalive in headers is unreliable.
  For example, uWSGI will not support keepalive even it serves a
  HTTP 1.1 request, except explicitly configure uWSGI with a
  ``--http-keepalive`` option.

  Servers designed like uWSGI could cause aiohttp intermittently
  raise a ConnectionResetException when the protocol poll runs
  out and some protocol is reused.
  `#4587 <https://github.com/aio-libs/aiohttp/issues/4587>`_
- Handle the last CRLF correctly even if it is received via separate TCP segment.
  `#4630 <https://github.com/aio-libs/aiohttp/issues/4630>`_
- Fix the register_resource function to validate route name before splitting it so that route name can include python keywords.
  `#4691 <https://github.com/aio-libs/aiohttp/issues/4691>`_
- Improve typing annotations for ``web.Request``, ``aiohttp.ClientResponse`` and
  ``multipart`` module.
  `#4736 <https://github.com/aio-libs/aiohttp/issues/4736>`_
- Fix resolver task is not awaited when connector is cancelled
  `#4795 <https://github.com/aio-libs/aiohttp/issues/4795>`_
- Fix a bug "Aiohttp doesn't return any error on invalid request methods"
  `#4798 <https://github.com/aio-libs/aiohttp/issues/4798>`_
- Fix HEAD requests for static content.
  `#4809 <https://github.com/aio-libs/aiohttp/issues/4809>`_
- Fix incorrect size calculation for memoryview
  `#4890 <https://github.com/aio-libs/aiohttp/issues/4890>`_
- Add HTTPMove to _all__.
  `#4897 <https://github.com/aio-libs/aiohttp/issues/4897>`_
- Fixed the type annotations in the ``tracing`` module.
  `#4912 <https://github.com/aio-libs/aiohttp/issues/4912>`_
- Fix typing for multipart ``__aiter__``.
  `#4931 <https://github.com/aio-libs/aiohttp/issues/4931>`_
- Fix for race condition on connections in BaseConnector that leads to exceeding the connection limit.
  `#4936 <https://github.com/aio-libs/aiohttp/issues/4936>`_
- Add forced UTF-8 encoding for ``application/rdap+json`` responses.
  `#4938 <https://github.com/aio-libs/aiohttp/issues/4938>`_
- Fix inconsistency between Python and C http request parsers in parsing pct-encoded URL.
  `#4972 <https://github.com/aio-libs/aiohttp/issues/4972>`_
- Fix connection closing issue in HEAD request.
  `#5012 <https://github.com/aio-libs/aiohttp/issues/5012>`_
- Fix type hint on BaseRunner.addresses (from ``List[str]`` to ``List[Any]``)
  `#5086 <https://github.com/aio-libs/aiohttp/issues/5086>`_
- Make `web.run_app()` more responsive to Ctrl+C on Windows for Python < 3.8. It slightly
  increases CPU load as a side effect.
  `#5098 <https://github.com/aio-libs/aiohttp/issues/5098>`_


Improved Documentation
----------------------

- Fix example code in client quick-start
  `#3376 <https://github.com/aio-libs/aiohttp/issues/3376>`_
- Updated the docs so there is no contradiction in ``ttl_dns_cache`` default value
  `#3512 <https://github.com/aio-libs/aiohttp/issues/3512>`_
- Add 'Deploy with SSL' to docs.
  `#4201 <https://github.com/aio-libs/aiohttp/issues/4201>`_
- Change typing of the secure argument on StreamResponse.set_cookie from ``Optional[str]`` to ``Optional[bool]``
  `#4204 <https://github.com/aio-libs/aiohttp/issues/4204>`_
- Changes ``ttl_dns_cache`` type from int to Optional[int].
  `#4270 <https://github.com/aio-libs/aiohttp/issues/4270>`_
- Simplify README hello word example and add a documentation page for people coming from requests.
  `#4272 <https://github.com/aio-libs/aiohttp/issues/4272>`_
- Improve some code examples in the documentation involving websockets and starting a simple HTTP site with an AppRunner.
  `#4285 <https://github.com/aio-libs/aiohttp/issues/4285>`_
- Fix typo in code example in Multipart docs
  `#4312 <https://github.com/aio-libs/aiohttp/issues/4312>`_
- Fix code example in Multipart section.
  `#4314 <https://github.com/aio-libs/aiohttp/issues/4314>`_
- Update contributing guide so new contributors read the most recent version of that guide. Update command used to create test coverage reporting.
  `#4810 <https://github.com/aio-libs/aiohttp/issues/4810>`_
- Spelling: Change "canonize" to "canonicalize".
  `#4986 <https://github.com/aio-libs/aiohttp/issues/4986>`_
- Add ``aiohttp-sse-client`` library to third party usage list.
  `#5084 <https://github.com/aio-libs/aiohttp/issues/5084>`_


Misc
----

- `#2856 <https://github.com/aio-libs/aiohttp/issues/2856>`_, `#4218 <https://github.com/aio-libs/aiohttp/issues/4218>`_, `#4250 <https://github.com/aio-libs/aiohttp/issues/4250>`_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-rejected regression Something that used to work stopped working "as before" after upgrade server
Projects
None yet
3 participants