-
Notifications
You must be signed in to change notification settings - Fork 46
Add semantic token LSP support for Quarto files #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds semantic token support from LSP servers to Quarto documents. Semantic tokens provide enhanced syntax highlighting based on language server analysis rather than grammar-based highlighting. The implementation includes middleware to intercept semantic token requests, convert them to virtual document coordinates, remap token indices between different legend formats, and adjust positions back to real document coordinates.
Key changes:
- Added semantic token provider middleware that creates virtual documents and delegates to embedded language servers
- Implemented token encoding/decoding and legend remapping utilities to handle differences between language server token legends
- Added comprehensive test coverage for token manipulation functions
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| apps/vscode/src/vdoc/vdoc.ts | Added semantic token coordinate adjustment function and registered "semanticTokens" as a virtual document action |
| apps/vscode/src/test/semanticTokens.test.ts | Added comprehensive test suite for semantic token encoding, decoding, and legend remapping |
| apps/vscode/src/providers/semantic-tokens.ts | Implemented semantic token provider with legend remapping and coordinate adjustment |
| apps/vscode/src/lsp/client.ts | Registered semantic token middleware provider in LSP client |
| apps/quarto-utils/src/semantic-tokens-legend.ts | Defined standard semantic token legend for Quarto documents |
| apps/quarto-utils/src/index.ts | Exported semantic token legend for use across packages |
| apps/lsp/src/middleware.ts | Added semantic token capability and handler to LSP middleware |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
vezwork
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without reading too deeply into semantic highlighting, this makes sense and the code seems straightforward. Kind of wild that we've got to do some bit twiddling lol.
Is it possible to add a test for embeddedSemanticTokensProvider? I'm not sure if we have access to MarkdownEngine in tests though. Perhaps we could make mock versions of token: CancellationToken, next: DocumentSemanticsTokensSignature to pass in?
| connection.languages.semanticTokens.on(async () => { | ||
| return { data: [] }; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to return with an empty array? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, based on how semantic tokens work. Returning { data: [] } says, "I'm handling this request successfully, but have no tokens to provide", while null would mean "capability not available" or "error". It's different from the other handlers where null is the standard way to say "no result".
This is the first time I've worked with semantic tokens, but I did find the spec helpful: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens
| * Decode semantic tokens from delta-encoded format to absolute positions | ||
| * | ||
| * Semantic tokens are encoded as [deltaLine, deltaStartChar, length, tokenType, tokenModifiers, ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "delta-encoded" mean that i.e. imagine we extracted line and delta line data to their own arrays, then deltaLine[i] === line[i] - line[i-1]? And deltaLine[0] === line[0]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, from what I understand, "delta-encoded" means the token positions are stored as relative offsets from the previous token, rather than absolute positions, so deltaLine is the number of lines relative to the previous token's line.
|
I did look in to testing
I don't think we get a lot of value and would end up just testing our mocks really. |
thanks, thats helpful to understand |
Addresses #420
What is semantic highlighting, you ask? Why, it is special, extra highlighting that some LSPs provide (as opposed to grammar-provided syntax highlighting):
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide
After working on this, I understand why we didn't add it a while ago; there is a LOT of bookkeeping here! 😅
The best way to see how this is behaving is to test this in VS Code (not Positron) using Pylance. We do intend to make changes in Positron so it works similarly, but we are a bit in flux right now with our Python LSP.
If you have some Python code that gets semantic tokens highlighted in a regular
.pyfile:We should mostly get the same semantic tokens highlighted in a
.qmdfile:Only some themes support the semantic token highlighting, so be sure to use one of those (like the main built-in themes).