Some emoji have incorrect width #57

darrenburns · 2022-01-14T09:37:40Z

Hi, thanks for your work on this project. It's been invaluable!

According to this document emoji presentation sequences should be treated as "East Asian Wide".

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

When wcwidth reads in the EastAsianWide.txt file, it discards all the emoji presentation sequences it finds, rather than treating them as being wide (since it discards everything without W or F properties).

The full list of 353 emojis affected is available at:
https://unicode.org/emoji/charts/emoji-variants.html

wcwidth will report all of the emoji in the above list as having width 1 instead of width 2.

I would be happy to PR this, but I'm not sure the master branch is clean - I noticed some walrus operators etc. despite my understanding being that this project is 2.7 compatible

The text was updated successfully, but these errors were encountered:

GalaxySnail · 2022-03-06T13:09:53Z

I noticed some walrus operators etc. despite my understanding being that this project is 2.7 compatible

Walrus operators and f-strings mean it can't run under python 2 (or even python 3.7). I have cleaned its python 2 compat codes in #58.

In my opinion, python 2 compat is useless for bin/update-tables.py, because we always need the latest python for unicodedata.name in the latest unicodedata module. On python 3.10, unicodedata.unidata_version is 13.0.0, and on python 3.11 it is 14.0.0.

jquast · 2022-03-08T18:19:42Z

that's correct, bin/update-tables.py is not meant to be python2 compatible, it is not distributed as part of the package.

Closes #88 The implementation of wcswidth is taken from the corresponding Python library (https://github.com/jquast/wcwidth) which seems to have the most updated list of wide characters. However, note that there are still some emoji that aren't recognised correctly; see e.g. jquast/wcwidth#57. At some point in time, the wcswidth implementation should be refactored into its own library.

Major ----- Bugfix zero-with characters, closes #57, #47, #45, #39, #26, #25, #24, #22, #8, wow ! This is mostly achieved by replacing `ZERO_WIDTH_CF` with dynamic parsing by Category codes in bin/update-tables.py and putting those in the zero-wide tables. Tests ----- - `verify-table-integrity.py` exercises a "bug" of duplicated tables that has no effect, because wcswidth() first checks for zero-width, and that is preferred in cases of conflict. This PR also resolves that error of duplication. - new automatic tests for balinese, kr jamo, zero-width emoji, devanagari, tamil, kannada. - added pytest-benchmark plugin, example use: # baseline tox -epy312 -- --verbose --benchmark-save=original # compare tox -epy312 -- --verbose --benchmark-compare=.benchmarks/Linux-CPython-3.12-64bit/0001_original.json

jquast added bug duplicate and removed duplicate labels Jan 15, 2023

jquast mentioned this issue Oct 19, 2023

Bugfixes for zero-width characters #91

Merged

jquast closed this as completed in #91 Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some emoji have incorrect width #57

Some emoji have incorrect width #57

darrenburns commented Jan 14, 2022

GalaxySnail commented Mar 6, 2022

jquast commented Mar 8, 2022

Some emoji have incorrect width #57

Some emoji have incorrect width #57

Comments

darrenburns commented Jan 14, 2022

GalaxySnail commented Mar 6, 2022

jquast commented Mar 8, 2022