fix(core): use byte offset for line_starts in text wrap #610

zenyr · 2026-02-01T06:27:06Z

Summary

Fix lineInfo.lineStarts to return byte offsets instead of column offsets
Prevents character corruption when slicing wrapped multi-byte text (CJK, emoji)

Fixes #609

Changes

Add byte_offset tracking to VirtualLine and wrap context in text-buffer-view.zig
Update cached_line_starts to use byte position instead of char/column position
Add regression tests for CJK text wrapping

Test

buffer.setStyledText(stringToStyledText("흐름도"))
view.setWrapWidth(4)
view.lineInfo.lineStarts // [0, 6] ✅ (was [0, 4] ❌)

Design Considerations

Alternatives Considered

Approach	Trade-off
Compute byte offset on-demand at read time	O(n) per `lineInfo` access — defeats caching purpose
Store only in `cached_line_starts` without `VirtualLine` field	Would require extra params in `commitVirtualLine`, harder to maintain
Derive from chunk byte lengths	Still needs global traversal; chunks are buffer-relative, not document-relative

Chosen approach: Track byte_offset during wrap (O(1) amortized). Memory overhead is ~4 bytes per wrapped line (~80KB for 10K lines), which is acceptable.

Relation to Prior Work

Initial text truncation + CJK++ word wrap fix #529 (merged): Introduced VirtualLine structure with char_offset. This PR builds on that by adding byte_offset alongside it.
fix(utf8): use col_offset for CJK word wrap boundary detection #512 (closed): Addressed wrap boundary detection with col_offset in WrapBreak. Different problem — this PR fixes lineStarts return values, not boundary calculation.

Both prior PRs left cached_line_starts using character/column offsets, which corrupts multi-byte characters when used for slicing.

Known Limitation

The no-wrap code path still uses char_offset for lineStarts. This is out of scope for this PR, as wrapping is the primary use case for lineStarts.

cc: @DonKongPaPa @kskang @soomtong @stefanahman — you showed interest in #255; this PR addresses a remaining edge case with line boundary handling. Feedback welcome!

line_starts was returning column offset instead of byte offset, causing CJK/emoji to split mid-character during text wrap. Related to anomalyco#255

pkg-pr-new · 2026-02-01T06:35:59Z

@opentui/core

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core@610

@opentui/react

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/react@610

@opentui/solid

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/solid@610

@opentui/core-darwin-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-darwin-arm64@610

@opentui/core-darwin-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-darwin-x64@610

@opentui/core-linux-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-linux-arm64@610

@opentui/core-linux-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-linux-x64@610

@opentui/core-win32-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-win32-arm64@610

@opentui/core-win32-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-win32-x64@610

commit: 2823590

zenyr · 2026-02-01T07:06:06Z

Hi, it looks like the failing test idle() resolves immediately when called on paused renderer might be a flaky timing-based test.

Expected: < 50ms
Actual: 56.31ms

I ran this test locally multiple times and it passed consistently. The changes in this PR only affect text-buffer-view.zig (byte offset tracking), which should be unrelated to the renderer idle logic.

Would it be possible to re-run the CI when you have a moment? Please let me know if there's anything else I should address. Thank you!

Verify line_starts returns byte offsets in no-wrap mode. Reproduces issue where Korean text "흐름\n도움" returned column offset instead of byte offset (7 bytes after first line). Related to anomalyco#255

No-wrap path was using char_offset for cached_line_starts, causing CJK text cursor positioning errors. Track global_byte_offset through segment iteration and compute per-line byte_offset. Fixes anomalyco#255

Eliminates redundant findWrapPosByWidth calls in hot paths by tracking byte positions directly in VirtualChunk. Split and reflow operations now use stored offsets instead of recalculating. Related to anomalyco#255

zenyr · 2026-02-01T08:37:29Z

Update from Discord discussion:

Addressed the "Known Limitation" mentioned in the PR body—no-wrap mode now uses byte_offset too.

Also optimized hot paths based on feedback about avoiding repeated measure calls:

Commit	Change
`00336eb`	Fix no-wrap to use byte_offset (was still using char_offset)
`089f334`	Refactor to proactive tracking—no more retrospective `getBytes()` loop
`e344f44`	Add `byte_start`/`byte_len` to `VirtualChunk`—eliminates `findWrapPosByWidth` recalculation during word-wrap reflow

The bigger refactor (merging findGraphemeInfo/findWrapBreaks into a single loop) is worth a separate issue.

Future optimization: unified grapheme/wrap-break loop

Currently findGraphemeInfo and findWrapBreaks in utf8.zig iterate over text separately:

findGraphemeInfo → collects byte_offset, byte_len, width, col_offset
findWrapBreaks → collects byte_offset, char_offset (no width info)

Both use similar SIMD-accelerated loops with ASCII fast paths. Merging them would:

Single pass instead of two
WrapBreak gets byte_len/width → word-wrap can use cached breaks directly
Eliminates grapheme lookups in hot path entirely

This could make grapheme_start in VirtualChunk redundant (replaced by byte_start), and remove the need for charOffsetToColumn conversions during wrap.

Estimated ~250 LOC across 4 files.

Add tests verifying that byte ranges stay aligned for multi-byte CJK characters in word wrap line starts and truncation scenarios.

simonklee · 2026-02-01T09:28:36Z

I think you still need to recompute the byte window to match the new column slice. Otherwise when the prefix ends mid‑chunk, byte_len still equals the full chunk length.

Quick fix to recomputes the byte window to match the new column slice: - For prefix: sets partial.byte_len to the byte count for space_left columns. - For suffix: advances partial.byte_start by the byte count for the skipped columns, and shrinks partial.byte_len.

zenyr requested review from Hona, kommander, msmps and simonklee as code owners February 1, 2026 06:27

fix(core): use byte offset for line_starts in text wrap

841364c

line_starts was returning column offset instead of byte offset, causing CJK/emoji to split mid-character during text wrap. Related to anomalyco#255

zenyr force-pushed the fix/255-line-starts-byte-offset branch from 10ee149 to 841364c Compare February 1, 2026 06:32

zenyr added 4 commits February 1, 2026 16:39

test(core): add no-wrap byte offset test for CJK text

4f2e8d1

Verify line_starts returns byte offsets in no-wrap mode. Reproduces issue where Korean text "흐름\n도움" returned column offset instead of byte offset (7 bytes after first line). Related to anomalyco#255

fix(core): use byte offset for line_starts in no-wrap mode

00336eb

No-wrap path was using char_offset for cached_line_starts, causing CJK text cursor positioning errors. Track global_byte_offset through segment iteration and compute per-line byte_offset. Fixes anomalyco#255

refactor(core): use proactive byte offset tracking in no-wrap mode

089f334

perf(core): store byte_start/byte_len in VirtualChunk

e344f44

Eliminates redundant findWrapPosByWidth calls in hot paths by tracking byte positions directly in VirtualChunk. Split and reflow operations now use stored offsets instead of recalculating. Related to anomalyco#255

test(text-buffer-view): add tests for CJK byte offset handling

76b11a4

Add tests verifying that byte ranges stay aligned for multi-byte CJK characters in word wrap line starts and truncation scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): use byte offset for line_starts in text wrap #610

fix(core): use byte offset for line_starts in text wrap #610

Uh oh!

zenyr commented Feb 1, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

zenyr commented Feb 1, 2026

Uh oh!

zenyr commented Feb 1, 2026 •

edited

Loading

Uh oh!

simonklee commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(core): use byte offset for line_starts in text wrap #610

Are you sure you want to change the base?

fix(core): use byte offset for line_starts in text wrap #610

Uh oh!

Conversation

zenyr commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test

Design Considerations

Alternatives Considered

Relation to Prior Work

Known Limitation

Uh oh!

pkg-pr-new bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zenyr commented Feb 1, 2026

Uh oh!

zenyr commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonklee commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zenyr commented Feb 1, 2026 •

edited

Loading

pkg-pr-new bot commented Feb 1, 2026 •

edited

Loading

zenyr commented Feb 1, 2026 •

edited

Loading