Skip to content

http uses email header parser but they have slightly different rules about whitespace #105973

Open
@bentasker

Description

@bentasker

Bug report

The function header_source_parse only strips leading whitespace from header values, any trailing whitespace before the CRLF is left intact:

>>> b = header_source_parse(["content-length:15 "])
>>> b[1]
'15 '

This can cause issues further up the callchain (for example: pallets/werkzeug#2734).

Although part of Email, this header parsing functionality is also used by http (call chain starts here), so the HTTP RFC's are also relevant

RFC 7231 notes

A field value might be preceded and/or followed by optional
whitespace (OWS); a single SP preceding the field-value is preferred
for consistent readability by humans. The field value does not
include any leading or trailing whitespace: OWS occurring before the
first non-whitespace octet of the field value or after the last
non-whitespace octet of the field value ought to be excluded by
parsers when extracting the field value from a header field.

Unfortunately, the Email RFC (RFC 5322) conflicts with this, expecting headers of the from name: value<CRLF>.

This means that the Email header parser's response, under some circumstances, is not compatible with the HTTP spec.

Options:

  1. Change header_source_parse() to use strip() instead of lstrip()
  2. Do subsequent processing in http to check for and strip trailing whitespace
  3. Make changes to the entire callchain to allow header_source_parse() to be told whether to strip trailing whitespace

Option 1 sounds inherently dangerous - although I can't think of a usage that would require preservation of trailing whitespace, it's permitted by RFC5322 so the workflow will exist somewhere.

Option 2 seems better than 3 - for 3) every single caller would need to be identified and updated.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-emailtriagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions