fix: parsing of multiline MIME encoded headers #718
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The email parser incorrectly parses multiline MIME-encoded headers. For example, given this header:
It is parsed as
instead of
That is, the lines of the headers are joined with a space, and then the result is decoded. The expected behavior is concatenating the lines, discarding the continuation whitespace characters. This is what, for example, Thunderbird (and other mail clients) does:
This behavior can be verified with, for example, https://dogmamix.com/MimeHeadersDecoder/
As a side effect, the current behavior results in creation of many unnecessary aliases of the same name, like this:
Unfortunately, the issue is in Python's internals. The
compat32
policy, however, parses the headers correctly (make_header(decode_header(value))
produces the expected result).This PR attempts to fix the described issue by using a custom policy derived from
email.policy.default
that implementsheader_fetch_parse
the wayemail.policy.compat32
does (and maintaining compatibility withemail.policy.default
).However, I am not a Python developer; there could be a cleaner (or better) way to do this. But it works :-)