fix: parsing of multiline MIME encoded headers #718

sjinks · 2025-05-06T09:40:50Z

The email parser incorrectly parses multiline MIME-encoded headers. For example, given this header:

From: =?UTF-8?Q?=D0=9E=D1=82=D0=B4=D0=B5=D0=BB_=D0=BF=D0=BE_=D1=80=D0=B0?=
 =?UTF-8?Q?=D0=B1=D0=BE=D1=82=D0=B5_=D1=81_=D0=BF=D1=80=D0=BE=D1=85=D0=BE?=
 =?UTF-8?Q?=D0=B6=D0=B4=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC_=D0=B7=D0=B0=D0=BA?=
 =?UTF-8?Q?=D0=BE=D0=BD=D0=BE=D0=BF=D1=80=D0=BE=D0=B5=D0=BA=D1=82=D0=BE?=
 =?UTF-8?Q?=D0=B2?= <redacted@example.com>

It is parsed as

Отдел по ра боте с прохо ждением зак онопроекто в <redacted@example.com>

instead of

Отдел по работе с прохождением законопроектов <redacted@example.com>

That is, the lines of the headers are joined with a space, and then the result is decoded. The expected behavior is concatenating the lines, discarding the continuation whitespace characters. This is what, for example, Thunderbird (and other mail clients) does:

This behavior can be verified with, for example, https://dogmamix.com/MimeHeadersDecoder/

As a side effect, the current behavior results in creation of many unnecessary aliases of the same name, like this:

Unfortunately, the issue is in Python's internals. The compat32 policy, however, parses the headers correctly (make_header(decode_header(value)) produces the expected result).

This PR attempts to fix the described issue by using a custom policy derived from email.policy.default that implements header_fetch_parse the way email.policy.compat32 does (and maintaining compatibility with email.policy.default).

However, I am not a Python developer; there could be a cleaner (or better) way to do this. But it works :-)

sjinks added 2 commits May 6, 2025 12:23

fix: parsing of multiline MIME encoded headers

dd6dab3

style: fix formatting

a1bc166

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: parsing of multiline MIME encoded headers #718

fix: parsing of multiline MIME encoded headers #718

Uh oh!

sjinks commented May 6, 2025

Uh oh!

Uh oh!

fix: parsing of multiline MIME encoded headers #718

Are you sure you want to change the base?

fix: parsing of multiline MIME encoded headers #718

Uh oh!

Conversation

sjinks commented May 6, 2025

Uh oh!

Uh oh!