Closed
Description
Bug report
char '0x3164' can be encode('ksx1001')
, but can not decode('ksx1001')
def main():
code_point = 0x3164
c = chr(code_point)
raw = c.encode('ksx1001')
c2 = raw.decode('ksx1001') # <--- this cause error
print(f'{c} {c2}')
if __name__ == '__main__':
main()
Traceback (most recent call last):
File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 11, in <module>
main()
File "/Users/takwolf/Develop/FontDev/fusion-pixel-font/build.py", line 6, in main
c2 = raw.decode('ksx1001')
^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'euc_kr' codec can't decode bytes in position 0-1: incomplete multibyte sequence
The char is Hangul Compatibility Jamo -> Hangul Filler
https://unicode-table.com/en/3164/
The following code is get the zone in ks-x-1001:
def main():
code_point = 0x3164
c = chr(code_point)
raw = c.encode('ksx1001')
block_offset = 0xA0
zone_1 = raw[0] - block_offset
zone_2 = raw[1] - block_offset
print(f'{zone_1} {zone_2}')
if __name__ == '__main__':
main()
zone_1 = 4
zone_2 = 52
https://en.wikipedia.org/wiki/KS_X_1001#Hangul_Filler
other chars in ksx1001 encode an decode is ok, but only this.
Your environment
- CPython versions tested on: Python 3.11.1
- Operating system and architecture: macOS 13.0