Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+" #8741

Merged

Conversation

ZackerySpytz
Copy link
Contributor

@ZackerySpytz ZackerySpytz commented Aug 12, 2018

The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).

https://bugs.python.org/issue22602

…uences starting with "+"

The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
@ZackerySpytz ZackerySpytz force-pushed the bpo-22602-utf-7-ill-formed branch from ad3d036 to 2d3153f Compare August 12, 2018 06:16
@@ -1630,6 +1630,10 @@ def test_codecs_utf7(self):
for c in set_o:
self.assertEqual(c.encode('ascii').decode('utf7'), c)

with self.assertRaisesRegex(UnicodeDecodeError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also a test for the non-strict error handler, e.g. "replace". UTF7Test.test_errors in test_codecs.py may be more appropriate place for this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I've added the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test in test_unicode.py is no longer needed, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test in test_codecs.py doesn't check the error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then all correct.

@@ -1020,6 +1020,7 @@ def test_errors(self):
(b'a+////,+IKw-b', 'a\uffff\ufffd\u20acb'),
(b'a+IKw-b\xff', 'a\u20acb\ufffd'),
(b'a+IKw\xffb', 'a\u20ac\ufffdb'),
(b'a+@b', 'a\ufffdb'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail. Is that you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The self.assertRaises() on line 1027 checks that a UnicodeDecodeError is raised with the "strict" error handler. Or did you mean something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I put the comment in the wrong line. I think that the test will fail on line 1029 self.assertEqual(raw.decode('utf-7', 'replace'), expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you're mistaken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also tests for b'a+' and b'a+@'. It may be need to change the code for proper handling these cases.

@@ -1020,6 +1020,7 @@ def test_errors(self):
(b'a+////,+IKw-b', 'a\uffff\ufffd\u20acb'),
(b'a+IKw-b\xff', 'a\u20acb\ufffd'),
(b'a+IKw\xffb', 'a\u20ac\ufffdb'),
(b'a+@b', 'a\ufffdb'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add also tests for b'a+' and b'a+@'. It may be need to change the code for proper handling these cases.

@@ -1630,6 +1630,10 @@ def test_codecs_utf7(self):
for c in set_o:
self.assertEqual(c.encode('ascii').decode('utf7'), c)

with self.assertRaisesRegex(UnicodeDecodeError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then all correct.

@@ -0,0 +1,3 @@
The UTF-7 decoder now raises :exc:`UnicodeDecodeError` for ill-formed
sequences starting with "+" (as specified in RFC 2152). Patch by Zackery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use :rfc:`2152`.

@ZackerySpytz
Copy link
Contributor Author

@serhiy-storchaka Serhiy, there is no ill-formed sequence in b'a+' (or b'+', for that matter) according to the RFC's definition.

@serhiy-storchaka serhiy-storchaka merged commit e349bf2 into python:master Aug 19, 2018
carljm added a commit to carljm/cpython that referenced this pull request Aug 19, 2018
* master: (107 commits)
  bpo-22057: Clarify eval() documentation (pythonGH-8812)
  bpo-34318: Convert deprecation warnings to errors in assertRaises() etc. (pythonGH-8623)
  bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (pythonGH-8741)
  bpo-34415: Updated logging.Formatter docstring. (pythonGH-8811)
  bpo-34432: doc Mention complex and decimal.Decimal on str.format not about locales (pythonGH-8808)
  bpo-34381: refer to 'Running & Writing Tests' in README.rst (pythonGH-8797)
  Improve error message when mock.assert_has_calls fails (pythonGH-8205)
  Warn not to set SIGPIPE to SIG_DFL (python#6773)
  bpo-34419: selectmodule.c does not compile on HP-UX due to bpo-31938 (pythonGH-8796)
  bpo-34418: Fix HTTPErrorProcessor documentation (pythonGH-8793)
  bpo-34391: Fix ftplib test for TLS 1.3 (pythonGH-8787)
  bpo-34217: Use lowercase for windows headers (pythonGH-8472)
  bpo-34395: Fix memory leaks caused by incautious usage of PyMem_Resize(). (pythonGH-8756)
  bpo-34405: Updated to OpenSSL 1.1.0i for Windows builds. (pythonGH-8775)
  bpo-34384: Fix os.readlink() on Windows (pythonGH-8740)
  closes bpo-34400: Fix undefined behavior in parsetok(). (pythonGH-4439)
  bpo-34399: 2048 bits RSA keys and DH params (python#8762)
  Make regular expressions in test_tasks.py raw strings. (pythonGH-8759)
  smtplib documentation fixes (pythonGH-8708)
  Fix misindented yaml in logging how to example (pythonGH-8604)
  ...
@ZackerySpytz ZackerySpytz deleted the bpo-22602-utf-7-ill-formed branch August 20, 2018 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants