Skip to content

Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the Document #857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 7, 2023

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Oct 6, 2023

Storing the syntax tree and semantic tokens in Document was an anti-pattern because those are Swift-related information, but Document should be language agnostic.

It also fixes the issue that we were polling for the syntax tree to be created. Instead, we can now await the construction of the syntax tree.

@ahoppen
Copy link
Member Author

ahoppen commented Oct 6, 2023

@swift-ci Please test

Copy link
Contributor

@bnbarham bnbarham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many managers 😅

Storing the syntax tree and semantic tokens in Document was an anti-pattern because those are Swift-related information, but Document should be language agnostic.

Is Document itself not really just a Swift thing anyway? I assume open/edit/close are all just forwarded directly to Clang, so is it even used there? Or should Document/DocumentManager/etc just be Swift-only and then we handle all the things in there rather than the separate syntax/semantic managers?

/// Keeps track of the semantic tokens that sourcekitd has sent us for given
/// document snapshots.
actor SemanticTokensManager {
private var semanticTokens: [DocumentSnapshot.ID: [SyntaxHighlightingToken]] = [:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we'll not have all that many documents, but given we only ever keep the latest this could just be eg.

struct VersionedTokens {
  let version: Int
  let tokens: [SyntaxHighlightingToken]
}
...
private var semanticTokens: [DocumentURI: VersionedTokens]

Then the discard is just semanticTokens.removeValue(uri) and setSemanticTokens would just check if the version is less than the one stored (if any), then set.

Assuming that is that we do only want the latest, right now we could technically store version 5 then 4 and we'd have both (but 4 then 5 would only have 5).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep it as-is. If we ever decide that we need to store semantic tokens for older version as well, it’s easier to update and conceptually we are storing semantic tokens a document snapshot, the cache eviction logic is separate from that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conceptually we are storing semantic tokens a document snapshot

Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.

Or to put another away:

If we ever decide that we need to store semantic tokens for older version as well

Doesn't seem like a use case we care about.

We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents).

🙇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.

I’ll add it to my To-Do list and let’s discuss this next week because conceptually we seem to be on the same page on this PR and this is just an implementation detail and I would like to continue making progress merging my commit backlog.

Comment on lines +16 to +17
/// Keeps track of SwiftSyntax trees for document snapshots and computes the
/// SwiftSyntax trees on demand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the thought that the highlight ranges in the semantic case aren't as expensive as the entire syntax tree? ie. why does one have a cache limit but the other doesn't? They're also both quite similar, ie. take an edit and cache the result (which here is the syntax tree and in the semantic is the highlight ranges).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that we can re-compute the syntax tree on demand but we can’t (in the current design) re-compute the semantic tokens. That’s why we can be more aggressive about discarding syntax trees than semantic tokens. If we wanted to change that, getting semantic tokens would need to become a sourcekitd request as well, instead of being sent over implicitly on every update (which we should probably do).

@@ -1510,6 +1504,9 @@ final class LocalSwiftTests: XCTestCase {
log("Received diagnostics for open - semantic")
})

// Send a request that triggers a syntax tree to be built.
_ = try sk.sendSync(FoldingRangeRequest(textDocument: .init(uri)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this needed for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we were always eagerly computing the syntax tree whenever we opened a file. Now, we only compute it once the first request requires it (and then implicitly keep it up to date on every edit, the thought being that you’ll probably need it again and doing incremental parses is faster than re-parsing from scratch when it’s needed next).

To test that we are parsing incrementally on the next edit, we need to trigger that initial build of a syntax tree, which is what I’m doing here.

Comment on lines -936 to -942
// FIXME: (async) We might not have computed the syntax tree yet. Wait until we have a syntax tree.
// Really, getting the syntax tree should be an async operation.
while snapshot.tokens.syntaxTree == nil {
try? await Task.sleep(nanoseconds: 1_000_000)
if let newSnapshot = documentManager.latestSnapshot(uri) {
snapshot = newSnapshot
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines 208 to 216
if let skTokens: SKDResponseArray = response[keys.annotations] {
let tokenParser = SyntaxHighlightingTokenParser(sourcekitd: sourcekitd)
var tokens: [SyntaxHighlightingToken] = []
tokenParser.parseTokens(skTokens, in: snapshot, into: &tokens)

docTokens.semantic = tokens
return tokens
}

return docTokens
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be turned into a guard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

/// A task that parses a SwiftSyntax tree from a source file, producing both
/// the syntax tree and the lookahead ranges that are needed for a subsequent
/// incremental parse.
private typealias SyntaxTreeComputation = Task<(tree: SourceFileSyntax, lookaheadRanges: LookaheadRanges), Never>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a comment on this PR, but is there a reason Parser.parseIncrementally returns a tuple instead of a dedicated struct? Seems like a single type would be easier to manage, and would help ensure you don't pass a mismatching syntax tree + lookahead ranges to IncrementalParseTransition (and would make it trivial to add any additional state later down the road if needed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought the same when using the API. We should probably change that

Filed swiftlang/swift-syntax#2267


/// The semantic tokens for the given snapshot or `nil` if no semantic tokens
/// have been computed yet.
func semanticTokens(for snapshotID: DocumentSnapshot.ID) -> [SyntaxHighlightingToken]? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer to wrap [SyntaxHighlightingToken] in its own struct, e.g then you could put members like lspEncoded and mergingTokens on it, and maybe move the edit processing logic onto it too. I don't feel that strongly about it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. It’s a slightly bigger change and I’ll do it in a follow-up PR.

…es for documents

This allows us to use Swift concurrency to await the computation of the SwiftSyntax tree for a given document instead of having to poll for its creation.
No code changes, just moving code around because `mergedAndSortedTokens` no longer belonged in DocumentTokens.swift
Storing the semantic tokens inside `Document` was an anti-pattern because the semantic tokens only applied to Swift and were also being updated while the document contents themselves stayed constant.

Instead, we should store the semantic tokens in a separate `SemanticTokensManager` that only exists in the `SwiftLanguageServer` and has the sole responsibility of tracking semantic tokens.
@ahoppen ahoppen force-pushed the ahoppen/syntax-tree-manager branch from d164203 to 2d44263 Compare October 6, 2023 21:25
@ahoppen
Copy link
Member Author

ahoppen commented Oct 6, 2023

Is Document itself not really just a Swift thing anyway? I assume open/edit/close are all just forwarded directly to Clang, so is it even used there? Or should Document/DocumentManager/etc just be Swift-only and then we handle all the things in there rather than the separate syntax/semantic managers?

We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents).

@ahoppen
Copy link
Member Author

ahoppen commented Oct 6, 2023

@swift-ci Please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants