ElementTree does not handle UTF-8 encoding #1127

ironpythonbot · 2014-12-09T18:17:03Z

import tempfile
>>> from xml.etree.ElementTree import ElementTree
>>> xml = '\n\n'
>>> with tempfile.TemporaryFile() as f:
... f.write(bytes(xml, 'utf-8')) # use xml.encode('utf-8') in CPython 2.7
... f.flush()
... f.seek(0)
... tree = ElementTree(file=f)
... name = next(tree.iter()).get('name')
... print(repr(name))
... assert name == unichr(169)
...
u'\xc2\xa9'
Traceback (most recent call last):
File "", line 8, in
AssertionError

unichr(169) is the copyright sign "©" and is encoded in UTF-8 as b'\xc2\xa9' . The two-byte encoding is ignored by ElementTree and gets
interpreted as two separate characters.

Work Item Details

Original CodePlex Issue: Issue 35635
Status: Proposed
Reason Closed: Unassigned
Assigned to: Unassigned
Reported on: Oct 21 at 4:23 AM
Reported by: ysitu
Updated on: Nov 7 at 2:43 PM
Updated by: tcalmant

The text was updated successfully, but these errors were encountered:

slozier · 2017-05-17T14:59:52Z

Repro code:

import tempfile
from xml.etree.ElementTree import ElementTree
xml = '<?xml version="1.0" encoding="UTF-8"?>\n<test name="' + unichr(169) + '"/>\n'
with tempfile.TemporaryFile() as f:
    f.write(xml.encode('utf-8'))
    f.flush()
    f.seek(0)
    tree = ElementTree(file=f)
    name = next(tree.iter()).get('name')
    print(repr(name))
    assert name == unichr(169)

ironpythonbot added the unassigned label Dec 9, 2014

jdhardy removed the unassigned label Dec 9, 2014

slide added the untriaged label Jul 14, 2016

slozier removed the untriaged label May 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElementTree does not handle UTF-8 encoding #1127

ElementTree does not handle UTF-8 encoding #1127

ironpythonbot commented Dec 9, 2014

slozier commented May 17, 2017

ElementTree does not handle UTF-8 encoding #1127

ElementTree does not handle UTF-8 encoding #1127

Comments

ironpythonbot commented Dec 9, 2014

Work Item Details

slozier commented May 17, 2017