Skip to content

gh-128110: Fix rfc2047 handling in email parser address headers #130749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

medmunds
Copy link
Contributor

@medmunds medmunds commented Mar 1, 2025

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that separates a pair of adjacent 'encoded-word's is ignored." The modern header value parser correctly implements that for unstructured headers, but had missed a case in structured headers. This could cause a parsed address header to include extraneous spaces in a display-name.

Fixed in get_atom() by converting a trailing CFWSList token after an encoded-word to an EWWhiteSpaceTerminal if another encoded-word follows.

Deliberately left similar code in get_dotatom() unmodified. A dotatom can only appear within an addr-spec. RFC 2047 Section 5 prohibits use of an encoded-word in any portion of an addr-spec, so its appearance in a dotatom is invalid. Adding (and testing) special white-space handling in an invalid dotatom seems an unnecessary complication.

Fixes gh-128110

Suggest label: topic-email

RFC 2047 Section 6.2 requires that "any 'linear-white-space' that
separates a pair of adjacent 'encoded-word's is ignored." The modern
header value parser correctly implements that for unstructured headers,
but had missed a case in structured headers. This could cause a parsed
address header to include extraneous spaces in a display-name.

Fixed in get_atom() by converting a trailing CFWSList token after an
encoded-word to an EWWhiteSpaceTerminal if another encoded-word follows.

Deliberately left similar code in get_dotatom() unmodified. A dotatom
can only appear within an addr-spec. RFC 2047 Section 5 prohibits
use of an encoded-word in any portion of an addr-spec, so its appearance
in a dotatom is invalid. Adding (and testing) special white-space
handling in an invalid dotatom seems an unnecessary complication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

email.parser can insert extraneous spaces when parsing rfc2047 headers with policy.default
3 participants