Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.policy.default - gotcha with re-using parsed headers with embedded newlines #121650

Closed
jwhitlock opened this issue Jul 12, 2024 · 8 comments
Labels
topic-email type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@jwhitlock
Copy link

jwhitlock commented Jul 12, 2024

Bug report

Bug description:

I'm not sure if this is a bug, feature request, or user error. I'm happy to re-file once I know which

If a parsed email header contains a correctly quoted newline, setting an email header to that value will include a newline.

from email import message_from_string
from email.policy import default

email_in = """\
To: incoming+tag@me.example.com
From: External Sender <sender@them.example.com>
Subject: Here's an =?UTF-8?Q?embedded_newline=0A?=
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

<html>
<head><title>An embeded newline</title></head>
<body>
  <p>I sent you an embedded newline in the subject. How do you like that?!</p>
</body>
</html>
"""

msg = message_from_string(email_in, policy=default)
msg = message_from_string(email_in, policy=default)
for header, value in msg.items():
    del msg[header]
    msg[header] = value
email_out = str(msg)
print(email_out)

Output is:

To: incoming+tag@me.example.com
From: External Sender <sender@them.example.com>
Subject: Here's an embedded newline

Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

<html>
<head><title>An embeded newline</title></head>
<body>
  <p>I sent you an embedded newline in the subject. How do you like that?!</p>
</body>
</html>

An email parser will interpret the newline as the start of the message. In this case, the Content-Type and other MIME headers will not be processed, and the email treated as plain text. In other cases, required headers like To may not be processed and the email will not be delivered.

I'd expect an error on setting the value, an error on serializing the EmailMessage to a string, the subject to retain the original encoding, or the newline to be quoted in the serialized version.

Now that we know the behavior, we can process the headers (embed or strip trailing newlines). However, you may see this is a bug, a needed feature, or missing documentation.

More info:

subject's type is a email.headerregistry._UniqueUnstructuredHeader. It has a name, so it is assigned without checking (email.policy.EmailPolicy.header_store_parse()).

The _parse_tree, returned by email._header_value_parser.get_unstructured(), is:

UnstructuredTokenList([ValueTerminal("Here's"), WhiteSpaceTerminal(' '), ValueTerminal('an'), WhiteSpaceTerminal(' '), EncodedWord([ValueTerminal('embedded'), WhiteSpaceTerminal(' '), ValueTerminal('newline\n')])])

A user encountered this for our email relaying service https://relay.firefox.com (mozilla/fx-private-relay#4841). An incoming email to a service address is matched to a user. We re-write the email headers and forward the email to the user's "real" address.

A real email has this subject header:

Subject: The All Over Piercings Wishlist of =?UTF-8?Q?John=2E=0A?=

This is from a European website https://www.alloverpiercings.com. You can create a wishlist and send it to an email address. The subject appears correctly encoded to me, to allow for non-ASCII usernames, with the unfortunate embedded newline. When forwarding this email, using something similar to the code above (but with more header modifications and additions), the embedded newline is turned into a real newline. The rest of the email headers are treated as part of the body. Since the Content-Type and other MIME headers are not processed as headers, the email is treated as a plain text email.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

macOS

Linked PRs

@jwhitlock jwhitlock added the type-bug An unexpected behavior, bug, or error label Jul 12, 2024
@jwhitlock
Copy link
Author

On further investigation, a plain string with a trailing newline has this issue:

email["Subject"] = "string with newlines\n"

So the "re-use parsed header" is not part of the issue. The problem might be the newline detection in header_store_parse:

cpython/Lib/email/policy.py

Lines 131 to 148 in dc03ce7

def header_store_parse(self, name, value):
"""+
The name is returned unchanged. If the input value has a 'name'
attribute and it matches the name ignoring case, the value is returned
unchanged. Otherwise the name and value are passed to header_factory
method, and the resulting custom header object is returned as the
value. In this case a ValueError is raised if the input value contains
CR or LF characters.
"""
if hasattr(value, 'name') and value.name.lower() == name.lower():
return (name, value)
if isinstance(value, str) and len(value.splitlines())>1:
# XXX this error message isn't quite right when we use splitlines
# (see issue 22233), but I'm not sure what should happen here.
raise ValueError("Header values may not contain linefeed "
"or carriage return characters")
return (name, self.header_factory(name, value))

A single element list is returned by "string with newlines\n".splitlines(), so it can't detect a trailing newline.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Jul 13, 2024

This is a bug (I was able to reproduce this on the CPython main branch), and looks like a minor security problem, considering this:

An email parser will interpret the newline as the start of the message.

For example, I could see someone developing an app that does something like this:

def email_notification(name: str):
    msg = EmailMessage()
    msg.set_content("This is an automatic notification blah blah blah...")
    msg["Subject"] = (
        f"{name} sent you a message!"
    )
    smtp_server.send_message(msg)

If a user set their name to something like "=?UTF-8?Q?=0A?==?UTF-8?Q?=0A?=This comes before the actual body!", then This comes before the actual body! would precedent the rest of the message. (FWIW, I'm not a security researcher nor a cybersecurity expert, this is speculative.)

Furthermore, you could use this to inject extra message headers.

@basbloemsaat
Copy link
Contributor

It seems to be a bug, or two even.

msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject\nBcc: injected@example.com'
print(str(msg))

The above throws a ValueError("Header values may not contain linefeed or carriage return characters"), as expected.

However the following does not, and inserts an extra newline, thus invalidating some headers:

msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject\n'
msg.set_content('This is 💩 the body of the message.\n')
print(str(msg))

and by using an utf8 encoded newline, it even inserts an extra header

msg = email.message.EmailMessage(policy=default)
msg['Subject'] = 'A 💩 subject=?UTF-8?Q?=0A?=Bcc: injected@example.com'
msg.set_content('This is 💩 the body of the message.\n')
print(str(msg))

.

So, I think two things have to be solved:

  1. newlines at the end should either throw a ValueError, like in the middle, or be stripped, as they are not allowed by the rfc
  2. encoded newlines should also throw a ValueError.

@encukou : I'll try to fix both during (or after) the EuroPython sprint, ok?

@jwhitlock
Copy link
Author

Thanks @basbloemsaat. Feel free to pick a better title for this issue (or suggest one if I need to change it), or re-file for the individual issues.

@ZeroIntensity
Copy link
Member

I'm pretty sure this is a security problem, as you can inject extra headers. @Eclips4 what do you think, and could you add the security label?

@Eclips4
Copy link
Member

Eclips4 commented Jul 15, 2024

I would like to hear @serhiy-storchaka opinion on this.

@encukou encukou added the type-security A security issue label Jul 27, 2024
encukou added a commit to encukou/cpython that referenced this issue Jul 30, 2024
encukou added a commit that referenced this issue Jul 30, 2024
…H-122233)

## Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.


## Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.


Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 30, 2024
…ound (pythonGH-122233)

GH-GH- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

GH-GH- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
encukou added a commit to encukou/cpython that referenced this issue Aug 2, 2024
…ound (pythonGH-122233)

- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0976339)
ambv pushed a commit to ambv/cpython that referenced this issue Aug 2, 2024
…s are sound (pythonGH-122233)

GH-GH- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

GH-GH- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit to ambv/cpython that referenced this issue Aug 2, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit to ambv/cpython that referenced this issue Aug 2, 2024
… are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit to ambv/cpython that referenced this issue Aug 2, 2024
… are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Yhg1s pushed a commit that referenced this issue Aug 6, 2024
…sound (GH-122233) (#122484)

gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)

GH-GH- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

GH-GH- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Yhg1s pushed a commit that referenced this issue Aug 6, 2024
…sound (GH-122233) (#122599)

* gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)

- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0976339)

* Document changes as made in 3.12.5
@medmunds
Copy link

medmunds commented Aug 6, 2024

#121284 turns out to be a variation of this, where refolding a parsed RFC 2047 encoded-word can leak 'specials' characters into structured headers without proper quoting/encoding. The security issue is not quite as severe as letting newlines leak in, but unquoted specials can allow manipulation of the message sender and recipients.

hroncok pushed a commit to fedora-python/cpython that referenced this issue Aug 6, 2024
…s are sound

pythongh-121650: Encode newlines in headers, and verify headers are sound (pythonGH-122233)

Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
hroncok pushed a commit to fedora-python/cpython that referenced this issue Aug 6, 2024
…s are sound

Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this issue Aug 13, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
frenzymadness pushed a commit to fedora-python/cpython that referenced this issue Aug 15, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
stratakis pushed a commit to stratakis/cpython that referenced this issue Aug 15, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
hrnciar added a commit to hrnciar/cpython that referenced this issue Aug 16, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
hrnciar added a commit to fedora-python/cpython that referenced this issue Aug 16, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
hrnciar added a commit to fedora-python/cpython that referenced this issue Aug 20, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
…ound (pythonGH-122233)

## Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.


## Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.


Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Aug 29, 2024
The :mod:`~email.generator` will now refuse to serialize (write) headers
that are improperly folded or delimited, such that they would be parsed as
multiple headers or joined with adjacent data.
If you need to turn this safety feature off,
set `~email.policy.Policy.verify_generated_headers`.

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.

They do need to be properly quoted when serialized to text, though.

Fixes: gh#python#121650
Fixes: bsc#1228780 (CVE-2024-6923)
From-PR: gh#python/cpython!122233
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Jakub Stasiak <jakub@stasiak.at>
Patch: CVE-2024-6923-email-hdr-inject.patch
ambv added a commit that referenced this issue Sep 4, 2024
…ound (GH-122233) (#122611)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv added a commit that referenced this issue Sep 4, 2024
…sound (GH-122233) (#122608)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

Verify that email headers are well-formed.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv added a commit that referenced this issue Sep 4, 2024
…sound (GH-122233) (#122609)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv added a commit that referenced this issue Sep 4, 2024
…ound (GH-122233) (#122610)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@encukou
Copy link
Member

encukou commented Sep 9, 2024

Thank you @jwhitlock for the report, and @basbloemsaat for the initial fix!

@encukou encukou closed this as completed Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-email type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests

6 participants