Skip to content

Conversation

@zenyr
Copy link

@zenyr zenyr commented Feb 1, 2026

Summary

  • Fix lineInfo.lineStarts to return byte offsets instead of column offsets
  • Prevents character corruption when slicing wrapped multi-byte text (CJK, emoji)

Fixes #609

Changes

  • Add byte_offset tracking to VirtualLine and wrap context in text-buffer-view.zig
  • Update cached_line_starts to use byte position instead of char/column position
  • Add regression tests for CJK text wrapping

Test

buffer.setStyledText(stringToStyledText("흐름도"))
view.setWrapWidth(4)
view.lineInfo.lineStarts // [0, 6] ✅ (was [0, 4] ❌)

Design Considerations

Alternatives Considered

Approach Trade-off
Compute byte offset on-demand at read time O(n) per lineInfo access — defeats caching purpose
Store only in cached_line_starts without VirtualLine field Would require extra params in commitVirtualLine, harder to maintain
Derive from chunk byte lengths Still needs global traversal; chunks are buffer-relative, not document-relative

Chosen approach: Track byte_offset during wrap (O(1) amortized). Memory overhead is ~4 bytes per wrapped line (~80KB for 10K lines), which is acceptable.

Relation to Prior Work

Both prior PRs left cached_line_starts using character/column offsets, which corrupts multi-byte characters when used for slicing.

Known Limitation

The no-wrap code path still uses char_offset for lineStarts. This is out of scope for this PR, as wrapping is the primary use case for lineStarts.


cc: @DonKongPaPa @kskang @soomtong @stefanahman — you showed interest in #255; this PR addresses a remaining edge case with line boundary handling. Feedback welcome!

line_starts was returning column offset instead of byte offset,
causing CJK/emoji to split mid-character during text wrap.

Related to anomalyco#255
@zenyr zenyr force-pushed the fix/255-line-starts-byte-offset branch from 10ee149 to 841364c Compare February 1, 2026 06:32
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 1, 2026

@opentui/core

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core@610

@opentui/react

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/react@610

@opentui/solid

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/solid@610

@opentui/core-darwin-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-darwin-arm64@610

@opentui/core-darwin-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-darwin-x64@610

@opentui/core-linux-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-linux-arm64@610

@opentui/core-linux-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-linux-x64@610

@opentui/core-win32-arm64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-win32-arm64@610

@opentui/core-win32-x64

npm i https://pkg.pr.new/anomalyco/opentui/@opentui/core-win32-x64@610

commit: 2823590

@zenyr
Copy link
Author

zenyr commented Feb 1, 2026

Hi, it looks like the failing test idle() resolves immediately when called on paused renderer might be a flaky timing-based test.

Expected: < 50ms
Actual: 56.31ms

I ran this test locally multiple times and it passed consistently. The changes in this PR only affect text-buffer-view.zig (byte offset tracking), which should be unrelated to the renderer idle logic.

Would it be possible to re-run the CI when you have a moment? Please let me know if there's anything else I should address. Thank you!

zenyr added 4 commits February 1, 2026 16:39
Verify line_starts returns byte offsets in no-wrap mode.
Reproduces issue where Korean text "흐름\n도움" returned column
offset instead of byte offset (7 bytes after first line).

Related to anomalyco#255
No-wrap path was using char_offset for cached_line_starts,
causing CJK text cursor positioning errors. Track global_byte_offset
through segment iteration and compute per-line byte_offset.

Fixes anomalyco#255
Eliminates redundant findWrapPosByWidth calls in hot paths by tracking
byte positions directly in VirtualChunk. Split and reflow operations
now use stored offsets instead of recalculating.

Related to anomalyco#255
@zenyr
Copy link
Author

zenyr commented Feb 1, 2026

Update from Discord discussion:

Addressed the "Known Limitation" mentioned in the PR body—no-wrap mode now uses byte_offset too.

Also optimized hot paths based on feedback about avoiding repeated measure calls:

Commit Change
00336eb Fix no-wrap to use byte_offset (was still using char_offset)
089f334 Refactor to proactive tracking—no more retrospective getBytes() loop
e344f44 Add byte_start/byte_len to VirtualChunk—eliminates findWrapPosByWidth recalculation during word-wrap reflow

The bigger refactor (merging findGraphemeInfo/findWrapBreaks into a single loop) is worth a separate issue.

Future optimization: unified grapheme/wrap-break loop

Currently findGraphemeInfo and findWrapBreaks in utf8.zig iterate over text separately:

  • findGraphemeInfo → collects byte_offset, byte_len, width, col_offset
  • findWrapBreaks → collects byte_offset, char_offset (no width info)

Both use similar SIMD-accelerated loops with ASCII fast paths. Merging them would:

  1. Single pass instead of two
  2. WrapBreak gets byte_len/width → word-wrap can use cached breaks directly
  3. Eliminates grapheme lookups in hot path entirely

This could make grapheme_start in VirtualChunk redundant (replaced by byte_start), and remove the need for charOffsetToColumn conversions during wrap.

Estimated ~250 LOC across 4 files.

Add tests verifying that byte ranges stay aligned for multi-byte CJK
characters in word wrap line starts and truncation scenarios.
@simonklee
Copy link
Collaborator

I think you still need to recompute the byte window to match the new column slice. Otherwise when the prefix ends mid‑chunk, byte_len still equals the full chunk length.

Quick fix to recomputes the byte window to match the new column slice:

  - For prefix: sets partial.byte_len to the byte count for space_left columns.
  - For suffix: advances partial.byte_start by the byte count for the skipped columns, and shrinks partial.byte_len.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lineStarts returns column offset instead of byte offset when wrapping multi-byte text

2 participants