Skip to content

Comments

[Repo Assist] Fix uint32 underflow in semantic token length for multiline ranges#1449

Merged
Krzysztof-Cieslak merged 3 commits intomainfrom
repo-assist/fix-issue-1407-semantic-token-multiline-underflow-b34bf9074b83c791
Feb 23, 2026
Merged

[Repo Assist] Fix uint32 underflow in semantic token length for multiline ranges#1449
Krzysztof-Cieslak merged 3 commits intomainfrom
repo-assist/fix-issue-1407-semantic-token-multiline-underflow-b34bf9074b83c791

Conversation

@github-actions
Copy link
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant.

Closes #1407

Root Cause

When FCS returns a SemanticClassificationItem whose range spans multiple lines (e.g. a multiline string literal or verbatim string), the token length computation in encodeSemanticHighlightRanges (LspHelpers.fs) was:

let tokenLen = uint32 (range.End.Character - range.Start.Character)

For a multiline range, End.Character (position on the last line) may be less than Start.Character (position on the first line). Since both are uint32, the subtraction underflows — e.g. 4 - 10 = 4294967290 when interpreted as uint32 (which is -6 in int32). Some LSP clients (notably Neovim) treat this as a token of ~4 GB length and freeze or crash.

The issue was first identified in #1407, with a good root-cause analysis in the comments by @onion108 and @TheAngryByrd.

Fix

Two-layer defence:

  1. scrubRanges (Commands.fs): Filter out any SemanticClassificationItem whose FCS range spans more than one line, since the LSP semantic token protocol requires each entry to be contained within a single line. This is the primary fix.

  2. encodeSemanticHighlightRanges (LspHelpers.fs): Add a guard that clamps tokenLen to 0u for any remaining multiline or inverted ranges that somehow escape scrubRanges. This is belt-and-suspenders.

Trade-offs

  • Multiline tokens (e.g. multiline strings, multiline comments) will no longer receive semantic highlighting. This is visible as a minor cosmetic regression — they'd fall back to syntax-based highlighting — but it's far better than the current behaviour of freezing the editor.
  • A follow-up improvement would be to split multiline tokens into per-line entries (as the LSP spec allows), but that requires access to the source text to compute per-line lengths and is a larger change.

Test Status

Build passed: dotnet build FsAutoComplete.sln -c Release — 0 warnings, 0 errors.

The existing test suite doesn't appear to have a test specifically for multiline semantic token ranges. A dedicated unit test for encodeSemanticHighlightRanges with a multiline range would be a good follow-up.

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@0d6e8cf9db90470cd5477c6a40b350fd9f9e1422. View source at https://github.com/githubnext/agentics/tree/0d6e8cf9db90470cd5477c6a40b350fd9f9e1422/workflows/repo-assist.md.

When FCS returns a SemanticClassificationItem whose range spans multiple
lines, the token length computation in encodeSemanticHighlightRanges was:

  uint32 (range.End.Character - range.Start.Character)

For a multiline range End.Character may be less than Start.Character
(e.g. a multiline string that starts at col 10 but ends at col 4),
giving an arithmetic underflow: the result wraps to a large uint32
(e.g. 4294967290 = -6 as uint32). Some LSP clients (e.g. Neovim)
interpret this as a very large token and freeze or allocate gigabytes.

Fixes #1407.

Two-layer defence:
1. scrubRanges now filters out any SemanticClassificationItem whose
   range spans more than one line, as the LSP semantic token protocol
   requires each entry to be contained within a single line.
2. encodeSemanticHighlightRanges gains a guard that clamps length to 0
   for any remaining multiline or inverted ranges that somehow escape
   scrubRanges.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Pull request created: #1449

@Krzysztof-Cieslak Krzysztof-Cieslak marked this pull request as ready for review February 22, 2026 22:15
@Krzysztof-Cieslak
Copy link
Member

/pr-fix

…ding

The multi-line if-else expression for tokenLen was missing a blank line
before the next let binding (tokenTy), inconsistent with the pattern
used for lineDelta and charDelta in the same function. Fantomas requires
blank lines between multi-line let bindings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 6cf2544

@github-actions
Copy link
Contributor Author

Fix: Fantomas Formatting

The CI was failing on the "Check format" step (dotnet fantomas --check build.fsx src) across all build matrix entries.

Root Cause

In src/FsAutoComplete/LspHelpers.fs, the new multi-line tokenLen if-else binding was missing a blank line before the next let tokenTy = ... binding. This is inconsistent with the existing pattern in the same function where lineDelta and charDelta (also multi-line if-else bindings) are both followed by blank lines before the next let binding. Fantomas enforces blank lines between multi-line let bindings.

Fix Applied

Added the missing blank line after the tokenLen if-else expression in LspHelpers.fs:

    let tokenLen =
      if range.End.Line > range.Start.Line || range.End.Character < range.Start.Character then
        0u
      else
        uint32 (range.End.Character - range.Start.Character)
                                                            ← blank line added here
    let tokenTy = uint32 ty

The Commands.fs changes (comment between pipeline operators and the filter step) did not require formatting changes.

Generated by PR Fix for issue #1449

To install this workflow, run gh aw add githubnext/agentics/workflows/pr-fix.md@ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5. View source at https://github.com/githubnext/agentics/tree/ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5/workflows/pr-fix.md.

Warning

⚠️ Firewall blocked 2 domains

The following domains were blocked by the firewall during workflow execution:

  • api.nuget.org
  • dc.services.visualstudio.com

@Krzysztof-Cieslak
Copy link
Member

/repo-assist add test cases for the PR

- Add LspHelpersTests.fs with 6 unit tests for encodeSemanticHighlightRanges:
  - empty input returns None
  - single-line range encodes length correctly
  - multiline range (End.Line > Start.Line) clamps tokenLen to 0 (regression test for #1407 underflow)
  - inverted single-line range (End.Char < Start.Char) clamps tokenLen to 0
  - multiline range does not corrupt delta encoding of subsequent tokens
  - sequence of single-line tokens uses correct relative deltas
- Extend HighlightingTest/Script.fsx with a multiline string literal to exercise the fix end-to-end
- Add integration test to HighlightingTests.fs verifying no decoded token has an unreasonably large length (would indicate uint32 underflow)
- Wire up LspHelpersTests.allTests and UtilsTests.allTests in Program.fs generalTests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 60ae8bb

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

I've added test cases to the PR branch. Here's what was added:

New unit tests — LspHelpersTests.fs

Six focused unit tests for encodeSemanticHighlightRanges that directly target the uint32 underflow regression:

Test What it verifies
empty input returns None baseline behaviour
single normal single-line range correct 5-element encoding (lineDelta, charDelta, tokenLen, …)
multiline range (End.Line > Start.Line) tokenLen is clamped to 0u, not the old ~4294967290u underflow value
inverted single-line range (End.Char < Start.Char) same clamping guard
multiline range + valid range subsequent token delta encoding is not corrupted
sequence of single-line tokens correct relative deltas throughout

All 6 unit tests pass locally.

Integration test — HighlightingTests.fs

Added a regression test that requests semantic tokens for Script.fsx (which now includes a multiline triple-quoted string) and asserts that every decoded token length is below a sanity threshold (1000u). If the fix were reverted, the multiline string token would produce a length of ~4294967290u, which would trip this assertion.

Supporting change — Program.fs

Wired up LspHelpersTests.allTests and UtilsTests.allTests (the latter was compiled but not previously included in any test run) into generalTests.

Generated by Repo Assist for issue #1449

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

@Krzysztof-Cieslak Krzysztof-Cieslak merged commit 86b57c3 into main Feb 23, 2026
26 checks passed
@Krzysztof-Cieslak Krzysztof-Cieslak deleted the repo-assist/fix-issue-1407-semantic-token-multiline-underflow-b34bf9074b83c791 branch February 23, 2026 16:40
github-actions bot added a commit that referenced this pull request Feb 24, 2026
Include all PRs merged since v0.83.0:
- #1441: Fix SourceLink go-to-def failure on .NET 10 on Linux
- #1452: Add FSharp.notifications.backgroundServiceProgress config option
- #1449: Fix semantic token multiline range uint32 underflow
- #1453: Fix spurious get/set rename in TextDocumentRename
- #1454: Fix missing { interpolated string completion trigger
- #1455: Fix non-ASCII path encoding in file URIs
- #1456: Disable inline values by default to restore pipeline hints
- #1457: Fix missing parens in function-type segments in AddExplicitTypeAnnotation
- #1458: Fix signature help parameter types showing fully-qualified names
- #1463: Fix seealso href/langword XML doc rendering

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Length data underflow for multiline token in response for textDocument/semanticTokens/full

1 participant