Skip to content

Conversation

@juliasilge
Copy link
Collaborator

@juliasilge juliasilge commented Nov 21, 2025

Addresses #420

What is semantic highlighting, you ask? Why, it is special, extra highlighting that some LSPs provide (as opposed to grammar-provided syntax highlighting):
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide

After working on this, I understand why we didn't add it a while ago; there is a LOT of bookkeeping here! 😅

The best way to see how this is behaving is to test this in VS Code (not Positron) using Pylance. We do intend to make changes in Positron so it works similarly, but we are a bit in flux right now with our Python LSP.

If you have some Python code that gets semantic tokens highlighted in a regular .py file:

Screenshot 2025-11-21 at 5 30 46 PM

We should mostly get the same semantic tokens highlighted in a .qmd file:

Screenshot 2025-11-21 at 5 32 20 PM

Only some themes support the semantic token highlighting, so be sure to use one of those (like the main built-in themes).

@juliasilge juliasilge changed the title WIP: Add semantic token LSP support for Quarto files Add semantic token LSP support for Quarto files Nov 22, 2025
@juliasilge juliasilge marked this pull request as ready for review November 22, 2025 00:34
@juliasilge juliasilge requested a review from Copilot November 22, 2025 00:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds semantic token support from LSP servers to Quarto documents. Semantic tokens provide enhanced syntax highlighting based on language server analysis rather than grammar-based highlighting. The implementation includes middleware to intercept semantic token requests, convert them to virtual document coordinates, remap token indices between different legend formats, and adjust positions back to real document coordinates.

Key changes:

  • Added semantic token provider middleware that creates virtual documents and delegates to embedded language servers
  • Implemented token encoding/decoding and legend remapping utilities to handle differences between language server token legends
  • Added comprehensive test coverage for token manipulation functions

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
apps/vscode/src/vdoc/vdoc.ts Added semantic token coordinate adjustment function and registered "semanticTokens" as a virtual document action
apps/vscode/src/test/semanticTokens.test.ts Added comprehensive test suite for semantic token encoding, decoding, and legend remapping
apps/vscode/src/providers/semantic-tokens.ts Implemented semantic token provider with legend remapping and coordinate adjustment
apps/vscode/src/lsp/client.ts Registered semantic token middleware provider in LSP client
apps/quarto-utils/src/semantic-tokens-legend.ts Defined standard semantic token legend for Quarto documents
apps/quarto-utils/src/index.ts Exported semantic token legend for use across packages
apps/lsp/src/middleware.ts Added semantic token capability and handler to LSP middleware

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@juliasilge juliasilge requested a review from vezwork November 22, 2025 00:37
Copy link
Collaborator

@vezwork vezwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reading too deeply into semantic highlighting, this makes sense and the code seems straightforward. Kind of wild that we've got to do some bit twiddling lol.

Is it possible to add a test for embeddedSemanticTokensProvider? I'm not sure if we have access to MarkdownEngine in tests though. Perhaps we could make mock versions of token: CancellationToken, next: DocumentSemanticsTokensSignature to pass in?

Comment on lines +59 to +61
connection.languages.semanticTokens.on(async () => {
return { data: [] };
});
Copy link
Collaborator

@vezwork vezwork Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to return with an empty array? If so, why?

Copy link
Collaborator Author

@juliasilge juliasilge Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, based on how semantic tokens work. Returning { data: [] } says, "I'm handling this request successfully, but have no tokens to provide", while null would mean "capability not available" or "error". It's different from the other handlers where null is the standard way to say "no result".

This is the first time I've worked with semantic tokens, but I did find the spec helpful: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens

Comment on lines +40 to +42
* Decode semantic tokens from delta-encoded format to absolute positions
*
* Semantic tokens are encoded as [deltaLine, deltaStartChar, length, tokenType, tokenModifiers, ...]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "delta-encoded" mean that i.e. imagine we extracted line and delta line data to their own arrays, then deltaLine[i] === line[i] - line[i-1]? And deltaLine[0] === line[0]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, from what I understand, "delta-encoded" means the token positions are stored as relative offsets from the previous token, rather than absolute positions, so deltaLine is the number of lines relative to the previous token's line.

@juliasilge
Copy link
Collaborator Author

I did look in to testing embeddedSemanticTokensProvider, but given that we don't install other extensions into the tests we'd have to mock:

  • window.activeTextEditor
  • engine.parse()
  • commands.executeCommand() (twice)
  • The virtual doc system

I don't think we get a lot of value and would end up just testing our mocks really.

@juliasilge juliasilge merged commit 591b352 into main Nov 26, 2025
2 checks passed
@juliasilge juliasilge deleted the add-semantic-token-middleware branch November 26, 2025 01:58
@vezwork
Copy link
Collaborator

vezwork commented Nov 26, 2025

I did look in to testing embeddedSemanticTokensProvider, but given that we don't install other extensions into the tests we'd have to mock:

  • window.activeTextEditor
  • engine.parse()
  • commands.executeCommand() (twice)
  • The virtual doc system

I don't think we get a lot of value and would end up just testing our mocks really.

thanks, thats helpful to understand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants