Skip to content

Unicode error in pytest for Hindi, Spansish and Russian in Windows. #66

Closed
@AmPhIbIaN26

Description

@AmPhIbIaN26

To recreate this run pytest in Windows. This error is taken from test_language_hi.py

tests\test_language_hi.py:42 (test_parse_number_till_hundred)
def test_parse_number_till_hundred():
>       _test_files(HUNDREDS_DIRECTORY, LANG)

test_language_hi.py:44: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
__init__.py:25: in _test_files
    for row in csv_reader:
~\Python\Python39\lib\csv.py:110: in __next__
    self.fieldnames
~\Python\Python39\lib\csv.py:97: in fieldnames
    self._fieldnames = next(self.reader)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalDecoder object at 0x000001743C3E54F0>
input = b'number,text\n0,\xe0\xa4\xb6\xe0\xa5\x82\xe0\xa4\xa8\xe0\xa5\x8d\xe0\xa4\xaf\n1,\xe0\xa4\x8f\xe0\xa4
\x95\n2,\xe0\xa4\...x8d\xe0\xa4\xaf\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xac\xe0\xa5\x87\n100,\xe0\xa4\x8f\xe0\xa4
\x95 \xe0\xa4\xb8\xe0\xa5\x8c'
final = False

    def decode(self, input, final=False):
>       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
E       UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 25: character maps to <undefined>

~\Python\Python39\lib\encodings\cp1252.py:23: UnicodeDecodeError
FAILED               [100%]
tests\test_language_hi.py:46 (test_parse_number_permutations)
def test_parse_number_permutations():
>       _test_files(PERMUTATION_DIRECTORY, LANG)

test_language_hi.py:48: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
__init__.py:25: in _test_files
    for row in csv_reader:
~\Python\Python39\lib\csv.py:110: in __next__
    self.fieldnames
~\Python\Python39\lib\csv.py:97: in fieldnames
    self._fieldnames = next(self.reader)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalDecoder object at 0x000001743C477700>
input = b'number,text\r\n1234,\xe0\xa4\x8f\xe0\xa4\x95 \xe0\xa4\xb9\xe0\xa4\x9c\xe0\xa4\xbe\xe0\xa4
\xb0 \xe0\xa4\xa6\xe0\xa5\x...0\xa4\xb8\xe0\xa5\x8c \xe0\xa4\xaa\xe0\xa5\x88\xe0\xa4\x82\xe0\xa4\xa4
\xe0\xa4\xbe\xe0\xa4\xb2\xe0\xa5\x80\xe0\xa4\xb8'
final = False

    def decode(self, input, final=False):
>       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
E       UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 20: character maps to <undefined>

~\Python\Python39\lib\encodings\cp1252.py:23: UnicodeDecodeError

This only happens in Windows not linux. This can be solved by adding encoding='utf8' to the open() function on line 23 in __init__.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions