Skip to content

fix: parsing of multiline MIME encoded headers #718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sjinks
Copy link

@sjinks sjinks commented May 6, 2025

The email parser incorrectly parses multiline MIME-encoded headers. For example, given this header:

From: =?UTF-8?Q?=D0=9E=D1=82=D0=B4=D0=B5=D0=BB_=D0=BF=D0=BE_=D1=80=D0=B0?=
 =?UTF-8?Q?=D0=B1=D0=BE=D1=82=D0=B5_=D1=81_=D0=BF=D1=80=D0=BE=D1=85=D0=BE?=
 =?UTF-8?Q?=D0=B6=D0=B4=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC_=D0=B7=D0=B0=D0=BA?=
 =?UTF-8?Q?=D0=BE=D0=BD=D0=BE=D0=BF=D1=80=D0=BE=D0=B5=D0=BA=D1=82=D0=BE?=
 =?UTF-8?Q?=D0=B2?= <redacted@example.com>

It is parsed as

Отдел по ра боте с прохо ждением зак онопроекто в <redacted@example.com>

instead of

Отдел по работе с прохождением законопроектов <redacted@example.com>

That is, the lines of the headers are joined with a space, and then the result is decoded. The expected behavior is concatenating the lines, discarding the continuation whitespace characters. This is what, for example, Thunderbird (and other mail clients) does:

Screenshot_20250506_123048

This behavior can be verified with, for example, https://dogmamix.com/MimeHeadersDecoder/

As a side effect, the current behavior results in creation of many unnecessary aliases of the same name, like this:

Screenshot_20250506_123325

Unfortunately, the issue is in Python's internals. The compat32 policy, however, parses the headers correctly (make_header(decode_header(value)) produces the expected result).

This PR attempts to fix the described issue by using a custom policy derived from email.policy.default that implements header_fetch_parse the way email.policy.compat32 does (and maintaining compatibility with email.policy.default).

However, I am not a Python developer; there could be a cleaner (or better) way to do this. But it works :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant