-
Notifications
You must be signed in to change notification settings - Fork 349
fix(core): use byte offset for line_starts in text wrap #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
line_starts was returning column offset instead of byte offset, causing CJK/emoji to split mid-character during text wrap. Related to anomalyco#255
10ee149 to
841364c
Compare
@opentui/core
@opentui/react
@opentui/solid
@opentui/core-darwin-arm64
@opentui/core-darwin-x64
@opentui/core-linux-arm64
@opentui/core-linux-x64
@opentui/core-win32-arm64
@opentui/core-win32-x64
commit: |
|
Hi, it looks like the failing test Expected: < 50ms I ran this test locally multiple times and it passed consistently. The changes in this PR only affect Would it be possible to re-run the CI when you have a moment? Please let me know if there's anything else I should address. Thank you! |
Verify line_starts returns byte offsets in no-wrap mode. Reproduces issue where Korean text "흐름\n도움" returned column offset instead of byte offset (7 bytes after first line). Related to anomalyco#255
No-wrap path was using char_offset for cached_line_starts, causing CJK text cursor positioning errors. Track global_byte_offset through segment iteration and compute per-line byte_offset. Fixes anomalyco#255
Eliminates redundant findWrapPosByWidth calls in hot paths by tracking byte positions directly in VirtualChunk. Split and reflow operations now use stored offsets instead of recalculating. Related to anomalyco#255
|
Update from Discord discussion: Addressed the "Known Limitation" mentioned in the PR body—no-wrap mode now uses Also optimized hot paths based on feedback about avoiding repeated measure calls:
The bigger refactor (merging Future optimization: unified grapheme/wrap-break loopCurrently
Both use similar SIMD-accelerated loops with ASCII fast paths. Merging them would:
This could make Estimated ~250 LOC across 4 files. |
Add tests verifying that byte ranges stay aligned for multi-byte CJK characters in word wrap line starts and truncation scenarios.
|
I think you still need to recompute the byte window to match the new column slice. Otherwise when the prefix ends mid‑chunk, byte_len still equals the full chunk length. |
Quick fix to recomputes the byte window to match the new column slice: - For prefix: sets partial.byte_len to the byte count for space_left columns. - For suffix: advances partial.byte_start by the byte count for the skipped columns, and shrinks partial.byte_len.
Summary
lineInfo.lineStartsto return byte offsets instead of column offsetsFixes #609
Changes
byte_offsettracking toVirtualLineand wrap context intext-buffer-view.zigcached_line_startsto use byte position instead of char/column positionTest
Design Considerations
Alternatives Considered
lineInfoaccess — defeats caching purposecached_line_startswithoutVirtualLinefieldcommitVirtualLine, harder to maintainChosen approach: Track
byte_offsetduring wrap (O(1) amortized). Memory overhead is ~4 bytes per wrapped line (~80KB for 10K lines), which is acceptable.Relation to Prior Work
VirtualLinestructure withchar_offset. This PR builds on that by addingbyte_offsetalongside it.col_offsetinWrapBreak. Different problem — this PR fixeslineStartsreturn values, not boundary calculation.Both prior PRs left
cached_line_startsusing character/column offsets, which corrupts multi-byte characters when used for slicing.Known Limitation
The no-wrap code path still uses
char_offsetforlineStarts. This is out of scope for this PR, as wrapping is the primary use case forlineStarts.cc: @DonKongPaPa @kskang @soomtong @stefanahman — you showed interest in #255; this PR addresses a remaining edge case with line boundary handling. Feedback welcome!