Skip to content
This repository has been archived by the owner on Sep 14, 2018. It is now read-only.

ElementTree does not handle UTF-8 encoding #1127

Open
ironpythonbot opened this issue Dec 9, 2014 · 1 comment
Open

ElementTree does not handle UTF-8 encoding #1127

ironpythonbot opened this issue Dec 9, 2014 · 1 comment

Comments

@ironpythonbot
Copy link

import tempfile
>>> from xml.etree.ElementTree import ElementTree
>>> xml = '\n\n'
>>> with tempfile.TemporaryFile() as f:
... f.write(bytes(xml, 'utf-8')) # use xml.encode('utf-8') in CPython 2.7
... f.flush()
... f.seek(0)
... tree = ElementTree(file=f)
... name = next(tree.iter()).get('name')
... print(repr(name))
... assert name == unichr(169)
...
u'\xc2\xa9'
Traceback (most recent call last):
File "", line 8, in
AssertionError

unichr(169) is the copyright sign "©" and is encoded in UTF-8 as b'\xc2\xa9' . The two-byte encoding is ignored by ElementTree and gets
interpreted as two separate characters.

Work Item Details

Original CodePlex Issue: Issue 35635
Status: Proposed
Reason Closed: Unassigned
Assigned to: Unassigned
Reported on: Oct 21 at 4:23 AM
Reported by: ysitu
Updated on: Nov 7 at 2:43 PM
Updated by: tcalmant

@slozier
Copy link
Contributor

slozier commented May 17, 2017

Repro code:

import tempfile
from xml.etree.ElementTree import ElementTree
xml = '<?xml version="1.0" encoding="UTF-8"?>\n<test name="' + unichr(169) + '"/>\n'
with tempfile.TemporaryFile() as f:
    f.write(xml.encode('utf-8'))
    f.flush()
    f.seek(0)
    tree = ElementTree(file=f)
    name = next(tree.iter()).get('name')
    print(repr(name))
    assert name == unichr(169)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants