Skip to content

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

Open
@tnakamot

Description

@tnakamot

Bug report

Bug description:

I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.

from email.parser import BytesParser
from email.policy import default
from io import BytesIO

mime_file_byte_array = b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\
_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\
-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'
fp = BytesIO(mime_file_byte_array)
parser = BytesParser(policy=default)
msg = parser.parse(fp)

parts = [part for part in msg.walk()]
binary_data = parts[1].get_payload(decode=True)

print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')

As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data is supposed to contain b"a\r\nb", but it was actually b"a\nb".

It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')

When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')

CPython versions tested on:

3.10

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions