Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some emoji have incorrect width #57

Closed
darrenburns opened this issue Jan 14, 2022 · 2 comments · Fixed by #91
Closed

Some emoji have incorrect width #57

darrenburns opened this issue Jan 14, 2022 · 2 comments · Fixed by #91
Labels

Comments

@darrenburns
Copy link

Hi, thanks for your work on this project. It's been invaluable!

According to this document emoji presentation sequences should be treated as "East Asian Wide".

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

When wcwidth reads in the EastAsianWide.txt file, it discards all the emoji presentation sequences it finds, rather than treating them as being wide (since it discards everything without W or F properties).

The full list of 353 emojis affected is available at:
https://unicode.org/emoji/charts/emoji-variants.html

wcwidth will report all of the emoji in the above list as having width 1 instead of width 2.

I would be happy to PR this, but I'm not sure the master branch is clean - I noticed some walrus operators etc. despite my understanding being that this project is 2.7 compatible

@GalaxySnail
Copy link
Collaborator

I noticed some walrus operators etc. despite my understanding being that this project is 2.7 compatible

Walrus operators and f-strings mean it can't run under python 2 (or even python 3.7). I have cleaned its python 2 compat codes in #58.

In my opinion, python 2 compat is useless for bin/update-tables.py, because we always need the latest python for unicodedata.name in the latest unicodedata module. On python 3.10, unicodedata.unidata_version is 13.0.0, and on python 3.11 it is 14.0.0.

@jquast
Copy link
Owner

jquast commented Mar 8, 2022

that's correct, bin/update-tables.py is not meant to be python2 compatible, it is not distributed as part of the package.

yongrenjie added a commit to alan-turing-institute/whatwhat that referenced this issue Apr 13, 2023
Closes #88

The implementation of wcswidth is taken from the corresponding Python
library (https://github.com/jquast/wcwidth) which seems to have the most
updated list of wide characters. However, note that there are still some
emoji that aren't recognised correctly; see e.g. jquast/wcwidth#57.

At some point in time, the wcswidth implementation should be refactored
into its own library.
jquast added a commit that referenced this issue Oct 30, 2023
Major
-----

Bugfix zero-with characters, closes #57, #47, #45, #39, #26, #25, #24, #22, #8, wow !

This is mostly achieved by replacing `ZERO_WIDTH_CF` with dynamic parsing by Category codes in bin/update-tables.py and putting those in the zero-wide tables.

Tests
-----

- `verify-table-integrity.py` exercises a "bug" of duplicated tables that has no effect, because wcswidth() first checks for zero-width, and that is preferred in cases of conflict. This PR also resolves that error of duplication.
- new automatic tests for balinese, kr jamo, zero-width emoji, devanagari, tamil, kannada.  
- added pytest-benchmark plugin, example use:

        # baseline
        tox -epy312 -- --verbose --benchmark-save=original
        # compare
        tox -epy312 -- --verbose --benchmark-compare=.benchmarks/Linux-CPython-3.12-64bit/0001_original.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants