Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. #17620

Merged
merged 4 commits into from
May 29, 2020

Conversation

maxking
Copy link
Contributor

@maxking maxking commented Dec 16, 2019

In certain malformed content-disposition headers, parameter values are quoted
and split as encoded words on two lines with extra whitespaces. This fixes the
issue by removing the extra whitespace between the two encoded words.

https://bugs.python.org/issue39040

…ed-words.

In certain malformed content-disposition headers, parameter values are quoted
and split as encoded words on two lines with extra whitespaces. This fixes the
issue by removing the extra whitespace between the two encoded words.
@maxking maxking requested a review from a team as a code owner December 16, 2019 02:51
@maxking maxking added needs backport to 3.7 type-bug An unexpected behavior, bug, or error labels Dec 16, 2019
Copy link
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to start a review :(

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@maxking maxking changed the title bpo-39040: Fix parsing of email headers with whitespace between encoded-words. bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. Dec 17, 2019
@maxking
Copy link
Contributor Author

maxking commented Dec 17, 2019

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

[],
'attachment; filename="File Name With Spaces.pdf"',
('Content-Disposition: attachment; '
'filename="File Name With Spaces.pdf"\n'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. We need a test with something like "File =?utf-8?q?Name?= With Spaces.pdf". That should have spaces around Name...we want to make sure we aren't removing spaces around encoded words that aren't next to other encoded words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be looking at the outdated diff, from a previous commit.

There should be 4 commits in the PR, and your requested changes are in the last 2 commits.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@maxking
Copy link
Contributor Author

maxking commented Dec 18, 2019

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

@csabella csabella removed the request for review from bitdancer May 28, 2020 23:24
@csabella csabella added the needs backport to 3.9 only security fixes label May 28, 2020
@csabella csabella requested a review from bitdancer May 28, 2020 23:24
@bitdancer bitdancer merged commit 21017ed into python:master May 29, 2020
@miss-islington
Copy link
Contributor

Thanks @maxking for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8, 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 29, 2020
…encoded-words. (pythongh-17620)

* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
(cherry picked from commit 21017ed)

Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 29, 2020
…encoded-words. (pythongh-17620)

* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
(cherry picked from commit 21017ed)

Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
@miss-islington
Copy link
Contributor

Thanks @maxking for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

miss-islington added a commit that referenced this pull request May 29, 2020
…encoded-words. (gh-17620)

* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
(cherry picked from commit 21017ed)

Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
miss-islington added a commit that referenced this pull request May 29, 2020
…encoded-words. (gh-17620)

* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
(cherry picked from commit 21017ed)

Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
miss-islington added a commit that referenced this pull request May 29, 2020
…encoded-words. (gh-17620)

* bpo-39040: Fix parsing of email headers with encoded-words inside a quoted string.

It is fairly common to find malformed mime headers (especially content-disposition
headers) where the parameter values, instead of being encoded to RFC
standards, are "encoded" by doing RFC 2047 "encoded word" encoding, and
then enclosing the whole thing in quotes.  The processing of these malformed
headers was incorrectly leaving the spaces between encoded words in the decoded
text (whitespace between adjacent encoded words is supposed to be stripped on
decoding).  This changeset fixes the encoded word processing inside quoted strings
(bare-quoted-string) to do correct RFC 2047 decoding by stripping that
whitespace.
(cherry picked from commit 21017ed)

Co-authored-by: Abhilash Raj <maxking@users.noreply.github.com>
CuriousLearner added a commit to CuriousLearner/cpython that referenced this pull request May 30, 2020
* 'master' of github.com:python/cpython: (497 commits)
  bpo-40061: Fix a possible refleak in _asynciomodule.c (pythonGH-19748)
  bpo-40798: Generate a different message for already removed elements (pythonGH-20483)
  closes bpo-29017: Update the bindings for Qt information with PySide2 (pythonGH-20149)
  bpo-39885: Make IDLE context menu cut and copy work again (pythonGH-18951)
  bpo-29882: Add an efficient popcount method for integers (python#771)
  Further de-linting of zoneinfo module (python#20499)
  bpo-40780: Fix failure of _Py_dg_dtoa to remove trailing zeros (pythonGH-20435)
  Indicate that abs() method accept argument that implement __abs__(), just like call() method in the docs (pythonGH-20509)
  bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. (pythongh-17620)
  bpo-40784: Fix sqlite3 deterministic test (pythonGH-20448)
  bpo-30064: Properly skip unstable loop.sock_connect() racing test (pythonGH-20494)
  Note the output ordering of combinatoric functions (pythonGH-19732)
  bpo-40474: Updated coverage.yml to better report coverage stats (python#19851)
  bpo-40806: Clarify that itertools.product immediately consumes its inpt (pythonGH-20492)
  bpo-1294959: Try to clarify the meaning of platlibdir (pythonGH-20332)
  bpo-37878: PyThreadState_DeleteCurrent() was not removed (pythonGH-20489)
  bpo-40777: Initialize PyDateTime_IsoCalendarDateType.tp_base at run-time (pythonGH-20493)
  bpo-40755: Add missing multiset operations to Counter() (pythonGH-20339)
  bpo-25920: Remove socket.getaddrinfo() lock on macOS (pythonGH-20177)
  bpo-40275: Fix test.support.threading_helper (pythonGH-20488)
  ...
@terryjreedy terryjreedy removed the needs backport to 3.9 only security fixes label Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants