Skip to content

euc_kr char '0x3164' decode('ksx1001') cause UnicodeDecodeError  #101863

Closed
@TakWolf

Description

@TakWolf

Bug report

char '0x3164' can be encode('ksx1001'), but can not decode('ksx1001')

def main():
    code_point = 0x3164
    c = chr(code_point)
    raw = c.encode('ksx1001')
    c2 = raw.decode('ksx1001')  # <--- this cause error 
    print(f'{c} {c2}')

if __name__ == '__main__':
    main()
Traceback (most recent call last):
  File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 11, in <module>
    main()
  File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 6, in main
    c2 = raw.decode('ksx1001')
         ^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'euc_kr' codec can't decode bytes in position 0-1: incomplete multibyte sequence

The char is Hangul Compatibility Jamo -> Hangul Filler

https://unicode-table.com/en/3164/

image

The following code is get the zone in ks-x-1001:

def main():
    code_point = 0x3164
    c = chr(code_point)
    raw = c.encode('ksx1001')
    block_offset = 0xA0
    zone_1 = raw[0] - block_offset
    zone_2 = raw[1] - block_offset
    print(f'{zone_1} {zone_2}')


if __name__ == '__main__':
    main()
zone_1 = 4 
zone_2 = 52

https://en.wikipedia.org/wiki/KS_X_1001#Hangul_Filler
image

image

other chars in ksx1001 encode an decode is ok, but only this.

Your environment

  • CPython versions tested on: Python 3.11.1
  • Operating system and architecture: macOS 13.0

Linked PRs

Metadata

Metadata

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-unicodetype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions