-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite interrupted by xml.etree.ElementTree.ParseError: not well-formed (invalid token) #121188
Comments
Extract of the XML:
|
Reproducer: from xml.etree import ElementTree as ET
suite = ET.Element('testsuite')
failure = ET.SubElement(suite, 'failure')
failure.set('message', 'abc \x1b def')
xml_str = ET.tostring(suite).decode('ascii')
print("XML:", ascii(xml_str))
suite2 = ET.fromstring(xml_str) Output: XML: '<testsuite><failure message="abc \x1b def" /></testsuite>'
Traceback (most recent call last):
File "/home/vstinner/python/main/x.py", line 10, in <module>
suite2 = ET.fromstring(xml_str)
File "/home/vstinner/python/main/Lib/xml/etree/ElementTree.py", line 1342, in XML
parser.feed(text)
~~~~~~~~~~~^^^^^^
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 33 |
Unfortunately, XML cannot represent all set of Unicode characters (even in the ASCII range). XML generating code in the stdlib ignores this, producing invalid XML, but the expat parser complains. There are several open issues about this (like #51976). There is no workaround. In this particular case we can replace unsupported characters before passing them to ElementTree. |
When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML such as the chr(27) control character used in ANSI escape sequences.
When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML such as the chr(27) control character used in ANSI escape sequences.
When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
Good idea. I wrote PR gh-121195 to do that :-) |
Oh, I also wrote a code, but have not create a PR yet due to a power off. Lets compare. |
When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
Fixed by af8c3d7. |
) (#121205) gh-121188: Sanitize invalid XML characters in regrtest (GH-121195) When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences. (cherry picked from commit af8c3d7) Co-authored-by: Victor Stinner <vstinner@python.org>
) (#121204) gh-121188: Sanitize invalid XML characters in regrtest (GH-121195) When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences. (cherry picked from commit af8c3d7) Co-authored-by: Victor Stinner <vstinner@python.org>
…121195) When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
…121195) When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
…121195) When creating the JUnit XML file, regrtest now escapes characters which are invalid in XML, such as the chr(27) control character used in ANSI escape sequences.
It seems like the problem comes from
0x1b
bytes (eg: displayed as^[
in vim editor), ANSI escape sequence, which is not properly escaped.Example: https://buildbot.python.org/all/#/builders/332/builds/1457
Linked PRs
The text was updated successfully, but these errors were encountered: