-
Notifications
You must be signed in to change notification settings - Fork 307
Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the Document
#857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ment` The reference just isn’t needed and this makes everything clearer.
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So many managers 😅
Storing the syntax tree and semantic tokens in Document was an anti-pattern because those are Swift-related information, but Document should be language agnostic.
Is Document itself not really just a Swift thing anyway? I assume open/edit/close are all just forwarded directly to Clang, so is it even used there? Or should Document/DocumentManager/etc just be Swift-only and then we handle all the things in there rather than the separate syntax/semantic managers?
/// Keeps track of the semantic tokens that sourcekitd has sent us for given | ||
/// document snapshots. | ||
actor SemanticTokensManager { | ||
private var semanticTokens: [DocumentSnapshot.ID: [SyntaxHighlightingToken]] = [:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll not have all that many documents, but given we only ever keep the latest this could just be eg.
struct VersionedTokens {
let version: Int
let tokens: [SyntaxHighlightingToken]
}
...
private var semanticTokens: [DocumentURI: VersionedTokens]
Then the discard is just semanticTokens.removeValue(uri)
and setSemanticTokens
would just check if the version is less than the one stored (if any), then set.
Assuming that is that we do only want the latest, right now we could technically store version 5 then 4 and we'd have both (but 4 then 5 would only have 5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep it as-is. If we ever decide that we need to store semantic tokens for older version as well, it’s easier to update and conceptually we are storing semantic tokens a document snapshot, the cache eviction logic is separate from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conceptually we are storing semantic tokens a document snapshot
Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.
Or to put another away:
If we ever decide that we need to store semantic tokens for older version as well
Doesn't seem like a use case we care about.
We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents).
🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.
I’ll add it to my To-Do list and let’s discuss this next week because conceptually we seem to be on the same page on this PR and this is just an implementation detail and I would like to continue making progress merging my commit backlog.
/// Keeps track of SwiftSyntax trees for document snapshots and computes the | ||
/// SwiftSyntax trees on demand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the thought that the highlight ranges in the semantic case aren't as expensive as the entire syntax tree? ie. why does one have a cache limit but the other doesn't? They're also both quite similar, ie. take an edit and cache the result (which here is the syntax tree and in the semantic is the highlight ranges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that we can re-compute the syntax tree on demand but we can’t (in the current design) re-compute the semantic tokens. That’s why we can be more aggressive about discarding syntax trees than semantic tokens. If we wanted to change that, getting semantic tokens would need to become a sourcekitd request as well, instead of being sent over implicitly on every update (which we should probably do).
@@ -1510,6 +1504,9 @@ final class LocalSwiftTests: XCTestCase { | |||
log("Received diagnostics for open - semantic") | |||
}) | |||
|
|||
// Send a request that triggers a syntax tree to be built. | |||
_ = try sk.sendSync(FoldingRangeRequest(textDocument: .init(uri))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we were always eagerly computing the syntax tree whenever we opened a file. Now, we only compute it once the first request requires it (and then implicitly keep it up to date on every edit, the thought being that you’ll probably need it again and doing incremental parses is faster than re-parsing from scratch when it’s needed next).
To test that we are parsing incrementally on the next edit, we need to trigger that initial build of a syntax tree, which is what I’m doing here.
// FIXME: (async) We might not have computed the syntax tree yet. Wait until we have a syntax tree. | ||
// Really, getting the syntax tree should be an async operation. | ||
while snapshot.tokens.syntaxTree == nil { | ||
try? await Task.sleep(nanoseconds: 1_000_000) | ||
if let newSnapshot = documentManager.latestSnapshot(uri) { | ||
snapshot = newSnapshot | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
if let skTokens: SKDResponseArray = response[keys.annotations] { | ||
let tokenParser = SyntaxHighlightingTokenParser(sourcekitd: sourcekitd) | ||
var tokens: [SyntaxHighlightingToken] = [] | ||
tokenParser.parseTokens(skTokens, in: snapshot, into: &tokens) | ||
|
||
docTokens.semantic = tokens | ||
return tokens | ||
} | ||
|
||
return docTokens | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be turned into a guard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
/// A task that parses a SwiftSyntax tree from a source file, producing both | ||
/// the syntax tree and the lookahead ranges that are needed for a subsequent | ||
/// incremental parse. | ||
private typealias SyntaxTreeComputation = Task<(tree: SourceFileSyntax, lookaheadRanges: LookaheadRanges), Never> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a comment on this PR, but is there a reason Parser.parseIncrementally
returns a tuple instead of a dedicated struct? Seems like a single type would be easier to manage, and would help ensure you don't pass a mismatching syntax tree + lookahead ranges to IncrementalParseTransition
(and would make it trivial to add any additional state later down the road if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought the same when using the API. We should probably change that
|
||
/// The semantic tokens for the given snapshot or `nil` if no semantic tokens | ||
/// have been computed yet. | ||
func semanticTokens(for snapshotID: DocumentSnapshot.ID) -> [SyntaxHighlightingToken]? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer to wrap [SyntaxHighlightingToken]
in its own struct, e.g then you could put members like lspEncoded
and mergingTokens
on it, and maybe move the edit processing logic onto it too. I don't feel that strongly about it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. It’s a slightly bigger change and I’ll do it in a follow-up PR.
…es for documents This allows us to use Swift concurrency to await the computation of the SwiftSyntax tree for a given document instead of having to poll for its creation.
No code changes, just moving code around because `mergedAndSortedTokens` no longer belonged in DocumentTokens.swift
Storing the semantic tokens inside `Document` was an anti-pattern because the semantic tokens only applied to Swift and were also being updated while the document contents themselves stayed constant. Instead, we should store the semantic tokens in a separate `SemanticTokensManager` that only exists in the `SwiftLanguageServer` and has the sole responsibility of tracking semantic tokens.
d164203
to
2d44263
Compare
We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents). |
@swift-ci Please test |
Storing the syntax tree and semantic tokens in
Document
was an anti-pattern because those are Swift-related information, butDocument
should be language agnostic.It also fixes the issue that we were polling for the syntax tree to be created. Instead, we can now await the construction of the syntax tree.