Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,18 @@ jobs:

python -Im pip install tox

- name: Build wheel
- name: Prepare sdist and source-dir
shell: bash
run: |
python -Im pip install build
python -Im build --wheel
python -Im build

mkdir source-dir
tar -xzvf dist/wcwidth-*.tar.gz -C source-dir --strip-components=1

- name: Fetch test data files
if: matrix.python-version == '3.14' && matrix.os == 'ubuntu-latest'
continue-on-error: true
shell: bash
run: |
python -Im tox -e fetch
Expand Down
9 changes: 5 additions & 4 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -455,10 +455,10 @@ languages.
History
=======

0.5.2 *unreleased*
* **Bugfix** Specification and result of category ``Mc`` (`Spacing Combining Mark`_), approx. 443
codepoints, has a more nuanced specification_, and may be categorized as both zero or wide.
`PR #200`.
0.5.2 *2026-01-29*
* **Bugfix** Measurement of category ``Mc`` (`Spacing Combining Mark`_), approx. 443, has a more
nuanced specification_, and may be categorized as either zero or wide. `PR #200`_.
* **Bugfix** Measurement of "standalone" modifiers and regional indicators, `PR #202`_.
* **Updated** Data files used in some automatic tests are no longer distributed. `PR #199`_

0.5.1 *2026-01-27*
Expand Down Expand Up @@ -666,6 +666,7 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
.. _`PR #196`: https://github.com/jquast/wcwidth/pull/196
.. _`PR #199`: https://github.com/jquast/wcwidth/pull/199
.. _`PR #200`: https://github.com/jquast/wcwidth/pull/200
.. _`PR #202`: https://github.com/jquast/wcwidth/pull/202
.. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
.. _`Issue #190`: https://github.com/jquast/wcwidth/issues/190
.. _`jquast/blessed`: https://github.com/jquast/blessed
Expand Down
33 changes: 29 additions & 4 deletions docs/specs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,22 @@ Any characters defined by `General Category`_ codes in `DerivedGeneralCategory.t
`Prepended_Concatenation_Mark`_ characters, aprox. 147 characters.
- 'Zl': `U+2028`_ LINE SEPARATOR only
- 'Zp': `U+2029`_ PARAGRAPH SEPARATOR only
- 'Sk': `Modifier Symbol`_, aprox. 4 characters of only those where phrase
``'EMOJI MODIFIER'`` is present in comment of `UnicodeData.txt`_.
- 'Sk': `Modifier Symbol`_, aprox. 1 character with ``'FULLWIDTH'`` in comment
of `UnicodeData.txt`_ (see `Width of 2`_). `Emoji Modifier`_ Fitzpatrick
symbols (`U+1F3FB`_ through `U+1F3FF`_) are zero-width only when following
an emoji base character in sequence; see `Width of 2`_ for standalone.

The NULL character (`U+0000`_).

Any character following ZWJ (`U+200D`_) when in sequence by
function :func:`wcwidth.wcswidth`.
Any character following ZWJ (`U+200D`_) when preceded by an emoji
(`Extended_Pictographic`_ property) or `Regional Indicator`_ in sequence by
function :func:`wcwidth.wcswidth`. When ZWJ follows a non-emoji character
(including CJK), only the ZWJ itself is zero-width; the following character
is measured normally.

The second `Regional Indicator`_ symbol (`U+1F1E6`_ through `U+1F1FF`_) in a
consecutive pair, when measured in sequence by :func:`wcwidth.wcswidth` or
:func:`wcwidth.width`. The first indicator of the pair is `Width of 2`_.

`Hangul Jamo`_ Jungseong and "Extended-B" code blocks, `U+1160`_ through
`U+11FF`_ and `U+D7B0`_ through `U+D7FF`_.
Expand All @@ -62,6 +71,15 @@ Any character defined by `East Asian`_ Fullwidth (``F``) or Wide (``W``)
properties in `EastAsianWidth.txt`_ files, except those that are defined by the
Category code of `Nonspacing Mark`_ (``Mn``).

`Regional Indicator`_ symbols (`U+1F1E6`_ through `U+1F1FF`_). Though
classified as Neutral in `EastAsianWidth.txt`_, terminals universally render
these as double-width. A consecutive pair of Regional Indicators forms a flag
emoji and is measured as width 2 total (first indicator is 2, second is 0).

`Emoji Modifier`_ Fitzpatrick symbols (`U+1F3FB`_ through `U+1F3FF`_) when
measured standalone (not following an emoji base character). When following
an emoji base, they combine with the base and add 0 to total width.

Any characters of `Modifier Symbol`_ category, ``'Sk'`` where ``'FULLWIDTH'`` is
present in comment of `UnicodeData.txt`_, aprox. 3 characters.

Expand Down Expand Up @@ -105,4 +123,11 @@ by a Nukta (``Mn``) and then a vowel sign (``Mc``) is measured as base + 1.
.. _`U+D7FF`: https://codepoints.net/U+D7FF
.. _`UnicodeData.txt`: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
.. _`East Asian`: https://www.unicode.org/reports/tr11/
.. _`U+1F1E6`: https://codepoints.net/U+1F1E6
.. _`U+1F1FF`: https://codepoints.net/U+1F1FF
.. _`U+1F3FB`: https://codepoints.net/U+1F3FB
.. _`U+1F3FF`: https://codepoints.net/U+1F3FF
.. _`Regional Indicator`: https://www.unicode.org/charts/PDF/U1F100.pdf
.. _`Emoji Modifier`: https://unicode.org/reports/tr51/#Emoji_Modifiers
.. _`Extended_Pictographic`: https://www.unicode.org/reports/tr51/#def_extended_pictographic
.. _`Nonspacing Mark`: https://www.unicode.org/versions/latest/core-spec/chapter-4/#G134153
25 changes: 25 additions & 0 deletions tests/test_benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,31 @@ def test_wcswidth_emoji_sequence(benchmark):
benchmark(wcwidth.wcswidth, text)


# Regional Indicator benchmarks - paired flags and unpaired RI
RI_FLAGS_PAIRED = '🇺🇸🇬🇧🇫🇷🇩🇪🇯🇵' * 100
RI_FLAGS_UNPAIRED = '🇺🇸🇬🇧🇫' * 100


def test_wcswidth_ri_flags_paired(benchmark):
"""Benchmark wcswidth() with paired regional indicator flags."""
benchmark(wcwidth.wcswidth, RI_FLAGS_PAIRED)


def test_wcswidth_ri_flags_unpaired(benchmark):
"""Benchmark wcswidth() with mixed paired and unpaired regional indicators."""
benchmark(wcwidth.wcswidth, RI_FLAGS_UNPAIRED)


def test_width_ri_flags_paired(benchmark):
"""Benchmark width() with paired regional indicator flags."""
benchmark(wcwidth.width, RI_FLAGS_PAIRED)


def test_width_ri_flags_unpaired(benchmark):
"""Benchmark width() with mixed paired and unpaired regional indicators."""
benchmark(wcwidth.width, RI_FLAGS_UNPAIRED)


# NFC vs NFD comparison - text with combining marks
DIACRITICS_COMPOSED = 'café résumé naïve ' * 100
DIACRITICS_DECOMPOSED = unicodedata.normalize('NFD', DIACRITICS_COMPOSED)
Expand Down
60 changes: 55 additions & 5 deletions tests/test_emojis.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def emoji_zwj_sequence():
"\u200d" # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
"\U0001f4bb") # Fused, Category So, East Asian Width property 'W' -- PERSONAL COMPUTER
# This test adapted from https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf
expect_length_each = (2, 0, 0, 2)
expect_length_each = (2, 2, 0, 2)
expect_length_phrase = 2

# exercise,
Expand All @@ -49,7 +49,7 @@ def test_unfinished_zwj_sequence():
phrase = ("\U0001f469" # Base, Category So, East Asian Width property 'W' -- WOMAN
"\U0001f3fb" # Modifier, Category Sk, East Asian Width property 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-1-2
"\u200d") # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
expect_length_each = (2, 0, 0)
expect_length_each = (2, 2, 0)
expect_length_phrase = 2

# exercise,
Expand All @@ -67,7 +67,7 @@ def test_non_recommended_zwj_sequence():
phrase = ("\U0001f469" # Base, Category So, East Asian Width property 'W' -- WOMAN
"\U0001f3fb" # Modifier, Category Sk, East Asian Width property 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-1-2
"\u200d") # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
expect_length_each = (2, 0, 0)
expect_length_each = (2, 2, 0)
expect_length_phrase = 2

# exercise,
Expand All @@ -87,7 +87,7 @@ def test_another_emoji_zwj_sequence():
"\u200D" # ZERO WIDTH JOINER
"\u2640" # FEMALE SIGN
"\uFE0F") # VARIATION SELECTOR-16
expect_length_each = (1, 0, 0, 1, 0)
expect_length_each = (1, 2, 0, 1, 0)
expect_length_phrase = 2

# exercise,
Expand Down Expand Up @@ -120,7 +120,7 @@ def test_longer_emoji_zwj_sequence():
"\U0001F3FD" # 'Sk', 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-4
) * 2
# This test adapted from https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf
expect_length_each = (2, 0, 0, 1, 0, 0, 2, 0, 2, 0) * 2
expect_length_each = (2, 2, 0, 1, 0, 0, 2, 0, 2, 2) * 2
expect_length_phrase = 4

# exercise,
Expand Down Expand Up @@ -191,6 +191,56 @@ def measure_all():
assert len(sequences) >= 742


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_regional_indicator_single():
"""Single Regional Indicator symbol is width 2."""
assert wcwidth.wcwidth('\U0001F1FA') == 2
assert wcwidth.wcswidth('\U0001F1FA') == 2


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_regional_indicator_pair():
"""Flag pair (two Regional Indicators) is width 2, not 4."""
assert wcwidth.wcswidth('\U0001F1FA\U0001F1F8') == 2


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_regional_indicator_three():
"""Three Regional Indicators: one pair (2) + one single (2) = 4."""
assert wcwidth.wcswidth('\U0001F1FA\U0001F1F8\U0001F1E6') == 4


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_regional_indicator_four():
"""Four Regional Indicators: two pairs = 2 + 2 = 4."""
assert wcwidth.wcswidth(
'\U0001F1FA\U0001F1F8\U0001F1E6\U0001F1FA') == 4


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_zwj_after_non_emoji():
"""ZWJ after non-emoji unconditionally consumes next character."""
# This does *not* match most terminal behavior -- it is a negative test,
# they fail because our library doesn't handle 'glitch' emoji as an
# optimization. Non-emoji + ZWJ is undefined per Unicode UAX #29 GB11.
assert wcwidth.wcswidth('xx\u200d\U0001F384') == 2
assert wcwidth.wcswidth('a\u200d\U0001F600') == 1
assert wcwidth.wcswidth('\u4e16\u200d\U0001F600') == 2


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_fitzpatrick_standalone():
"""Standalone Fitzpatrick modifier is width 2."""
assert wcwidth.wcwidth('\U0001F3FB') == 2
assert wcwidth.wcswidth('\U0001F3FB') == 2


@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
def test_fitzpatrick_after_emoji():
"""Fitzpatrick modifier after emoji base combines, total width 2."""
assert wcwidth.wcswidth('\U0001F469\U0001F3FB') == 2


def test_vs16_effect():
"""Verify effect of VS-16 (always active with latest Unicode version)."""
phrase = ("\u2640" # FEMALE SIGN
Expand Down
12 changes: 12 additions & 0 deletions tests/test_width.py
Original file line number Diff line number Diff line change
Expand Up @@ -437,3 +437,15 @@ def test_soft_hyphen_exception():
"""U+00AD SOFT HYPHEN remains width 1 for ISO-8859-1 compatibility."""
result = wcwidth.wcwidth('\u00AD')
assert result == 1


def test_fitzpatrick_modifier_after_emoji():
"""Fitzpatrick modifier following emoji base adds zero-width in width()."""
result = wcwidth.width('\U0001F469\U0001F3FB')
assert result == 2


def test_fitzpatrick_modifier_standalone_width():
"""Standalone Fitzpatrick modifier, however, is wide character in width()."""
result = wcwidth.width('\U0001F3FB')
assert result == 2
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ commands = python {toxinidir}/bin/update-tables.py {posargs:--fetch-all-versions
basepython = python3.14
usedevelop = true
deps = -r requirements-update.txt
commands = python {toxinidir}/bin/update-tables.py {posargs:--only-fetch}
commands = - python {toxinidir}/bin/update-tables.py {posargs:--only-fetch}

[testenv:autopep8]
basepython = python3.14
Expand Down
2 changes: 2 additions & 0 deletions wcwidth/table_wide.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
(0x1f0cf, 0x1f0cf,), # Playing Card Black Joker
(0x1f18e, 0x1f18e,), # Negative Squared Ab
(0x1f191, 0x1f19a,), # Squared Cl ..Squared Vs
(0x1f1e6, 0x1f1ff,), # Regional Indicator Symbo..Regional Indicator Symbo
(0x1f200, 0x1f202,), # Square Hiragana Hoka ..Squared Katakana Sa
(0x1f210, 0x1f23b,), # Squared Cjk Unified Ideo..Squared Cjk Unified Ideo
(0x1f240, 0x1f248,), # Tortoise Shell Bracketed..Tortoise Shell Bracketed
Expand All @@ -104,6 +105,7 @@
(0x1f3e0, 0x1f3f0,), # House Building ..European Castle
(0x1f3f4, 0x1f3f4,), # Waving Black Flag
(0x1f3f8, 0x1f3fa,), # Badminton Racquet And Sh..Amphora
(0x1f3fb, 0x1f3ff,), # Emoji Modifier Fitzpatri..Emoji Modifier Fitzpatri
(0x1f400, 0x1f43e,), # Rat ..Paw Prints
(0x1f440, 0x1f440,), # Eyes
(0x1f442, 0x1f4fc,), # Ear ..Videocassette
Expand Down
4 changes: 3 additions & 1 deletion wcwidth/table_zero.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,9 @@
(0x1e6f5, 0x1e6f5,), # Tai Yo Sign Om
(0x1e8d0, 0x1e8d6,), # Mende Kikakui Combining ..Mende Kikakui Combining
(0x1e944, 0x1e94a,), # Adlam Alif Lengthener ..Adlam Nukta
(0x1f3fb, 0x1f3ff,), # Emoji Modifier Fitzpatri..Emoji Modifier Fitzpatri
# Emoji Modifier Fitzpatrick types (U+1F3FB..U+1F3FF) excluded:
# standalone they display as wide (2 cells), only zero-width
# when following an emoji base character in sequence.
(0xe0000, 0xe0fff,), # (nil)
),
}
Loading
Loading