Description
Bug report
The function header_source_parse
only strips leading whitespace from header values, any trailing whitespace before the CRLF is left intact:
>>> b = header_source_parse(["content-length:15 "])
>>> b[1]
'15 '
This can cause issues further up the callchain (for example: pallets/werkzeug#2734).
Although part of Email
, this header parsing functionality is also used by http
(call chain starts here), so the HTTP RFC's are also relevant
RFC 7231 notes
A field value might be preceded and/or followed by optional
whitespace (OWS); a single SP preceding the field-value is preferred
for consistent readability by humans. The field value does not
include any leading or trailing whitespace: OWS occurring before the
first non-whitespace octet of the field value or after the last
non-whitespace octet of the field value ought to be excluded by
parsers when extracting the field value from a header field.
Unfortunately, the Email RFC (RFC 5322) conflicts with this, expecting headers of the from name: value<CRLF>
.
This means that the Email header parser's response, under some circumstances, is not compatible with the HTTP spec.
Options:
- Change
header_source_parse()
to usestrip()
instead oflstrip()
- Do subsequent processing in
http
to check for and strip trailing whitespace - Make changes to the entire callchain to allow
header_source_parse()
to be told whether to strip trailing whitespace
Option 1 sounds inherently dangerous - although I can't think of a usage that would require preservation of trailing whitespace, it's permitted by RFC5322 so the workflow will exist somewhere.
Option 2 seems better than 3 - for 3)
every single caller would need to be identified and updated.