Skip to content

support variation selectors, ZWJ sequences and surrogate pairs in length calculation#2082

Merged
patriksvensson merged 3 commits intospectreconsole:mainfrom
fabsenet:1847-unicode-widechar-table
Apr 16, 2026
Merged

support variation selectors, ZWJ sequences and surrogate pairs in length calculation#2082
patriksvensson merged 3 commits intospectreconsole:mainfrom
fabsenet:1847-unicode-widechar-table

Conversation

@fabsenet
Copy link
Copy Markdown
Contributor

@fabsenet fabsenet commented Apr 13, 2026

Fixes #1847

  • I have read the Contribution Guidelines
  • I have checked that there isn't already another pull request that solves the above issue
  • All newly added code is adequately covered by tests
  • All existing tests are still running without errors

AI

I am working with unicode, windows-1252 and codepages as part of my day job for quite some time. I used Claude Code here as a tool to write faster. I checked every little change myself.

Changes

src/Spectre.Console/Internal/Cell.cs

  • GetCellLength(string): NET6+ → delegates to UnicodeCalculator.GetWidth(string) which already handles grapheme clusters correctly via EnumerateRunes(); netstandard2.0 → iterates with StringInfo.GetTextElementEnumerator
  • GetCellLength(ReadOnlySpan<char>): now delegates to the string path instead of iterating char-by-char
  • New private GetClusterCellLength(string): FE0F → 2, ZWJ → 2, surrogate pair → code point width lookup

src/Spectre.Console/Rendering/Segment.cs

  • Split(): char-foreach replaced with StringInfo enumerator — prevents splits in the middle of a grapheme cluster
  • Truncate(): same change
  • SplitSegment(): same change

src/Spectre.Console.Tests/Unit/CellTests.cs (new)

  • 8 parameterized tests: ASCII, CJK, ♥, ❤️, ZWJ family sequence, ❤️‍🔥, flag 🇩🇪, empty string

src/Spectre.Console.Tests/Unit/Widgets/Table/TableTests.cs

  • New snapshot test Should_Render_Table_With_Wide_Emoji_Correctly

Please upvote 👍 this pull request if you are interested in it.

@fabsenet
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@fabsenet
Copy link
Copy Markdown
Contributor Author

you have to look at the expected output in the terminal as also vs code or github tend to render it incorrectly!

grafik

@patriksvensson
Copy link
Copy Markdown
Contributor

Sorry, but I have a bit difficulty understanding these changes. I appreciate the PR but all logic for Unicode calculation should go into the wcwidth library.

Why use UnicodeCalculator on net6 and above and a complete different approach otherwise?

@fabsenet
Copy link
Copy Markdown
Contributor Author

the problem is the length counting in char by char.
the wide heart basically has two unicode chars "heart + wide" but the original logic was counting them one by one and "wide" on its own has a width of zero. So the original logic has 1+0=1. but the actual printed char takes the space of 2 normal characters and thats why the table is drawn wrong.

the easy fix would have been to fix the one-by-one counting to detect the "wide" and change the result but it would have been a single fix only. this approach now fixes more cases.

@patriksvensson
Copy link
Copy Markdown
Contributor

The string overload in UnicodeCalculator does this already afaik.

@patriksvensson
Copy link
Copy Markdown
Contributor

I appreciate it, but I would rather get the calculation correct in wcwidth to be honest since it's used by other projects as well (the reason for it existing).

I'll take a look at this PR as soon as I can.

@fabsenet fabsenet force-pushed the 1847-unicode-widechar-table branch from 8ae634d to fbeae03 Compare April 14, 2026 00:55
public static int GetCellLength(ReadOnlySpan<char> text)
public static int GetCellLength(string text)
{
#if !NETSTANDARD2_0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this obviosly is now not fixed for .net standard 2.0. UnicodeCalculator.GetWidth(string) is not available for .net standard?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely correct. I look forward to drop netstandard2.0 at some point (around the 2.0 release of Spectre.Console). For now, I'm OK with having two different implementations (and potentially erroneous rendering on .NET Framework). What do you think?

@fabsenet
Copy link
Copy Markdown
Contributor Author

i reworked it to use the UnicodeCalculator.getstring more, this makes it a bit easier now!

@patriksvensson
Copy link
Copy Markdown
Contributor

Ok, I've looked through it, and I think I understand the changes now. Will take it for a spin after I've merged another emoji-related bug tonight.

Copy link
Copy Markdown
Contributor

@patriksvensson patriksvensson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. No tests are failing and it solves the issue. Some minor nit-pick thing that needs to be fixed though.

Comment thread src/Spectre.Console/Rendering/Segment.cs Outdated
@fabsenet
Copy link
Copy Markdown
Contributor Author

Hi i am not sure what to do right now. i am fine with the change of extra brackets but i do not see an accept button anywhere? do i need to put this change in my branch? i see no simple accept button anywhere?

@patriksvensson
Copy link
Copy Markdown
Contributor

I already took care of it 🙂

@patriksvensson patriksvensson merged commit 0525e4c into spectreconsole:main Apr 16, 2026
3 checks passed
@patriksvensson
Copy link
Copy Markdown
Contributor

Merged! Thank you for your contribution. Much appreciated! 👍

This was referenced Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some characters break table output

2 participants