Description
Bug report
Bug description:
email.message.EmailMessage
accepts invalid header field names without error, which raise an error when parsed, regardless of policy and causes corrupt emails.
Case in point (with python 3.13.1 installed via pyenv, occurs in 3.11
and earlier as well):
delgado@tuxedo-e101776:~> python3.13
Python 3.13.1 (main, Dec 10 2024, 15:13:47) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.message
>>> message = email.message.EmailMessage()
>>> message.add_header('From', 'me@example.example')
None
>>> message.add_header('To', 'me@example.example')
None
>>> message.add_header('Subject', 'Example Subject')
None
>>> message.add_header('Invalid Header', 'Contains a space, which is illegal')
None
>>> message.add_header('X-Valid Header', 'Custom header as recommended')
None
>>> message.set_content('Hello, this is an example!')
None
>>> message.defects
[]
>>> message._headers
[('From', 'me@example.example'),
('To', 'me@example.example'),
('Subject', 'Example Subject'),
('Invalid Header', 'Contains a space, which is illegal'),
('X-Valid Header', 'Custom header as recommended'),
('Content-Type', 'text/plain; charset="utf-8"'),
('Content-Transfer-Encoding', '7bit'),
('MIME-Version', '1.0')]
>>> message.as_string()
('From: me@example.example\n'
'To: me@example.example\n'
'Subject: Example Subject\n'
'Invalid Header: Contains a space, which is illegal\n'
'X-Valid Header: Custom header as recommended\n'
'Content-Type: text/plain; charset="utf-8"\n'
'Content-Transfer-Encoding: 7bit\n'
'MIME-Version: 1.0\n'
'\n'
'Hello, this is an example!\n')
>>> message.policy
EmailPolicy()
>>> msg_string = message.as_string()
>>> msg_string
('From: me@example.example\n'
'To: me@example.example\n'
'Subject: Example Subject\n'
'Invalid Header: Contains a space, which is illegal\n'
'X-Valid Header: Custom header as recommended\n'
'Content-Type: text/plain; charset="utf-8"\n'
'Content-Transfer-Encoding: 7bit\n'
'MIME-Version: 1.0\n'
'\n'
'Hello, this is an example!\n')
>>> import email.parser
>>> parsed_message = email.parser.Parser().parsestr(msg_string)
>>> parsed_message._headers
[('From', 'me@example.example'),
('To', 'me@example.example'),
('Subject', 'Example Subject')]
>>> parsed_message.as_string()
('From: me@example.example\n'
'To: me@example.example\n'
'Subject: Example Subject\n'
'\n'
'Invalid Header: Contains a space, which is illegal\n'
'X-Valid Header: Custom header as recommended\n'
'Content-Type: text/plain; charset="utf-8"\n'
'Content-Transfer-Encoding: 7bit\n'
'MIME-Version: 1.0\n'
'\n'
'Hello, this is an example!\n')
>>> parsed_message.policy
Compat32()
>>> parsed_message.defects
[MissingHeaderBodySeparatorDefect()]
>>> import email.policy
>>> parsed_message_strict = email.parser.Parser(policy=email.policy.strict).parsestr(msg_string)
Traceback (most recent call last):
File "<python-input-19>", line 1, in <module>
parsed_message_strict = email.parser.Parser(policy=email.policy.strict).parsestr(msg_string)
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/parser.py", line 64, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/parser.py", line 53, in parse
feedparser.feed(data)
~~~~~~~~~~~~~~~^^^^^^
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 176, in feed
self._call_parse()
~~~~~~~~~~~~~~~~^^
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 180, in _call_parse
self._parse()
~~~~~~~~~~~^^
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/feedparser.py", line 234, in _parsegen
self.policy.handle_defect(self._cur, defect)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
File "/home/delgado/git/pyenv/versions/3.13.1/lib/python3.13/email/_policybase.py", line 193, in handle_defect
raise defect
email.errors.MissingHeaderBodySeparatorDefect
>>> parsed_message_nonstrict = email.parser.Parser(policy=email.policy.default).parsestr(msg_string)
>>> parsed_message_nonstrict.as_string()
('From: me@example.example\n'
'To: me@example.example\n'
'Subject: Example Subject\n'
'\n'
'Invalid Header: Contains a space, which is illegal\n'
'X-Valid Header: Custom header as recommended\n'
'Content-Type: text/plain; charset="utf-8"\n'
'Content-Transfer-Encoding: 7bit\n'
'MIME-Version: 1.0\n'
'\n'
'Hello, this is an example!\n')
>>> parsed_message_nonstrict.defects
[MissingHeaderBodySeparatorDefect()]
The illegal header field name is accepted by EmailMessage without a defect, but when the resulting message is parsed, regardless of policy, it looks to me like header parsing stops at that point and the line with the defect header is viewed as first line of the body, which leads to the MissingHeaderBodySeparatorDefect
.
It's interesting that email.headers
contains the following:
# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822. Character range is from tilde to exclamation mark.
# For use with .match()
fcre = re.compile(r'[\041-\176]+:$')
which is the correct regex according to the rfc, including the final colon, which apparently isn't used anywhere in the code.
A MUA (such as claws or mutt) will display the resulting email with the remaining headers as part of the body, breaking any mime multipart rendering.
CPython versions tested on:
3.11, 3.13
Operating systems tested on:
Linux