[Repo Assist] Fix uint32 underflow in semantic token length for multiline ranges#1449
Conversation
When FCS returns a SemanticClassificationItem whose range spans multiple lines, the token length computation in encodeSemanticHighlightRanges was: uint32 (range.End.Character - range.Start.Character) For a multiline range End.Character may be less than Start.Character (e.g. a multiline string that starts at col 10 but ends at col 4), giving an arithmetic underflow: the result wraps to a large uint32 (e.g. 4294967290 = -6 as uint32). Some LSP clients (e.g. Neovim) interpret this as a very large token and freeze or allocate gigabytes. Fixes #1407. Two-layer defence: 1. scrubRanges now filters out any SemanticClassificationItem whose range spans more than one line, as the LSP semantic token protocol requires each entry to be contained within a single line. 2. encodeSemanticHighlightRanges gains a guard that clamps length to 0 for any remaining multiline or inverted ranges that somehow escape scrubRanges. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ Pull request created: #1449 |
|
/pr-fix |
…ding The multi-line if-else expression for tokenLen was missing a blank line before the next let binding (tokenTy), inconsistent with the pattern used for lineDelta and charDelta in the same function. Fantomas requires blank lines between multi-line let bindings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ Commit pushed: |
Fix: Fantomas FormattingThe CI was failing on the "Check format" step ( Root CauseIn Fix AppliedAdded the missing blank line after the let tokenLen =
if range.End.Line > range.Start.Line || range.End.Character < range.Start.Character then
0u
else
uint32 (range.End.Character - range.Start.Character)
← blank line added here
let tokenTy = uint32 tyThe
Warning
|
|
/repo-assist add test cases for the PR |
- Add LspHelpersTests.fs with 6 unit tests for encodeSemanticHighlightRanges: - empty input returns None - single-line range encodes length correctly - multiline range (End.Line > Start.Line) clamps tokenLen to 0 (regression test for #1407 underflow) - inverted single-line range (End.Char < Start.Char) clamps tokenLen to 0 - multiline range does not corrupt delta encoding of subsequent tokens - sequence of single-line tokens uses correct relative deltas - Extend HighlightingTest/Script.fsx with a multiline string literal to exercise the fix end-to-end - Add integration test to HighlightingTests.fs verifying no decoded token has an unreasonably large length (would indicate uint32 underflow) - Wire up LspHelpersTests.allTests and UtilsTests.allTests in Program.fs generalTests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ Commit pushed: |
|
🤖 This is an automated response from Repo Assist. I've added test cases to the PR branch. Here's what was added: New unit tests —
|
| Test | What it verifies |
|---|---|
| empty input returns None | baseline behaviour |
| single normal single-line range | correct 5-element encoding (lineDelta, charDelta, tokenLen, …) |
| multiline range (End.Line > Start.Line) | tokenLen is clamped to 0u, not the old ~4294967290u underflow value |
| inverted single-line range (End.Char < Start.Char) | same clamping guard |
| multiline range + valid range | subsequent token delta encoding is not corrupted |
| sequence of single-line tokens | correct relative deltas throughout |
All 6 unit tests pass locally.
Integration test — HighlightingTests.fs
Added a regression test that requests semantic tokens for Script.fsx (which now includes a multiline triple-quoted string) and asserts that every decoded token length is below a sanity threshold (1000u). If the fix were reverted, the multiline string token would produce a length of ~4294967290u, which would trip this assertion.
Supporting change — Program.fs
Wired up LspHelpersTests.allTests and UtilsTests.allTests (the latter was compiled but not previously included in any test run) into generalTests.
Generated by Repo Assist for issue #1449
To install this workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.
Include all PRs merged since v0.83.0: - #1441: Fix SourceLink go-to-def failure on .NET 10 on Linux - #1452: Add FSharp.notifications.backgroundServiceProgress config option - #1449: Fix semantic token multiline range uint32 underflow - #1453: Fix spurious get/set rename in TextDocumentRename - #1454: Fix missing { interpolated string completion trigger - #1455: Fix non-ASCII path encoding in file URIs - #1456: Disable inline values by default to restore pipeline hints - #1457: Fix missing parens in function-type segments in AddExplicitTypeAnnotation - #1458: Fix signature help parameter types showing fully-qualified names - #1463: Fix seealso href/langword XML doc rendering Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 This PR was created by Repo Assist, an automated AI assistant.
Closes #1407
Root Cause
When FCS returns a
SemanticClassificationItemwhose range spans multiple lines (e.g. a multiline string literal or verbatim string), the token length computation inencodeSemanticHighlightRanges(LspHelpers.fs) was:For a multiline range,
End.Character(position on the last line) may be less thanStart.Character(position on the first line). Since both areuint32, the subtraction underflows — e.g.4 - 10 = 4294967290when interpreted asuint32(which is-6inint32). Some LSP clients (notably Neovim) treat this as a token of ~4 GB length and freeze or crash.The issue was first identified in #1407, with a good root-cause analysis in the comments by @onion108 and
@TheAngryByrd.Fix
Two-layer defence:
scrubRanges(Commands.fs): Filter out anySemanticClassificationItemwhose FCS range spans more than one line, since the LSP semantic token protocol requires each entry to be contained within a single line. This is the primary fix.encodeSemanticHighlightRanges(LspHelpers.fs): Add a guard that clampstokenLento0ufor any remaining multiline or inverted ranges that somehow escapescrubRanges. This is belt-and-suspenders.Trade-offs
Test Status
✅ Build passed:
dotnet build FsAutoComplete.sln -c Release— 0 warnings, 0 errors.The existing test suite doesn't appear to have a test specifically for multiline semantic token ranges. A dedicated unit test for
encodeSemanticHighlightRangeswith a multiline range would be a good follow-up.