Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the `Document` #857

ahoppen · 2023-10-06T15:33:36Z

Storing the syntax tree and semantic tokens in Document was an anti-pattern because those are Swift-related information, but Document should be language agnostic.

It also fixes the issue that we were polling for the syntax tree to be created. Instead, we can now await the construction of the syntax tree.

…ment` The reference just isn’t needed and this makes everything clearer.

ahoppen · 2023-10-06T15:33:42Z

@swift-ci Please test

bnbarham

So many managers 😅

Storing the syntax tree and semantic tokens in Document was an anti-pattern because those are Swift-related information, but Document should be language agnostic.

Is Document itself not really just a Swift thing anyway? I assume open/edit/close are all just forwarded directly to Clang, so is it even used there? Or should Document/DocumentManager/etc just be Swift-only and then we handle all the things in there rather than the separate syntax/semantic managers?

Sources/SourceKitLSP/DocumentManager.swift

Sources/SourceKitLSP/Swift/SemanticTokens.swift

bnbarham · 2023-10-06T17:46:03Z

Sources/SourceKitLSP/Swift/SemanticTokensManager.swift

+/// Keeps track of the semantic tokens that sourcekitd has sent us for given
+/// document snapshots.
+actor SemanticTokensManager {
+  private var semanticTokens: [DocumentSnapshot.ID: [SyntaxHighlightingToken]] = [:]


I suspect we'll not have all that many documents, but given we only ever keep the latest this could just be eg.

struct VersionedTokens { let version: Int let tokens: [SyntaxHighlightingToken] } ... private var semanticTokens: [DocumentURI: VersionedTokens]

Then the discard is just semanticTokens.removeValue(uri) and setSemanticTokens would just check if the version is less than the one stored (if any), then set.

Assuming that is that we do only want the latest, right now we could technically store version 5 then 4 and we'd have both (but 4 then 5 would only have 5).

I would prefer to keep it as-is. If we ever decide that we need to store semantic tokens for older version as well, it’s easier to update and conceptually we are storing semantic tokens a document snapshot, the cache eviction logic is separate from that.

conceptually we are storing semantic tokens a document snapshot

Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.

Or to put another away:

If we ever decide that we need to store semantic tokens for older version as well

Doesn't seem like a use case we care about.

We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents).

🙇

Heh, I see this as the other way. We're storing tokens for the latest version of a document. The request for tokens is really "give me the latest tokens of the document which should match version ". If it doesn't, then we're out of date and the response isn't likely to matter anyway.

I’ll add it to my To-Do list and let’s discuss this next week because conceptually we seem to be on the same page on this PR and this is just an implementation detail and I would like to continue making progress merging my commit backlog.

Sources/SourceKitLSP/Swift/SemanticTokensManager.swift

Sources/SourceKitLSP/Swift/SwiftLanguageServer.swift

bnbarham · 2023-10-06T18:29:55Z

Sources/SourceKitLSP/Swift/SyntaxTreeManager.swift

+/// Keeps track of SwiftSyntax trees for document snapshots and computes the
+/// SwiftSyntax trees on demand.


Is the thought that the highlight ranges in the semantic case aren't as expensive as the entire syntax tree? ie. why does one have a cache limit but the other doesn't? They're also both quite similar, ie. take an edit and cache the result (which here is the syntax tree and in the semantic is the highlight ranges).

The difference is that we can re-compute the syntax tree on demand but we can’t (in the current design) re-compute the semantic tokens. That’s why we can be more aggressive about discarding syntax trees than semantic tokens. If we wanted to change that, getting semantic tokens would need to become a sourcekitd request as well, instead of being sent over implicitly on every update (which we should probably do).

bnbarham · 2023-10-06T18:36:56Z

Tests/SourceKitLSPTests/LocalSwiftTests.swift

@@ -1510,6 +1504,9 @@ final class LocalSwiftTests: XCTestCase {
      log("Received diagnostics for open - semantic")
    })

+    // Send a request that triggers a syntax tree to be built.
+    _ = try sk.sendSync(FoldingRangeRequest(textDocument: .init(uri)))


What's this needed for?

Previously, we were always eagerly computing the syntax tree whenever we opened a file. Now, we only compute it once the first request requires it (and then implicitly keep it up to date on every edit, the thought being that you’ll probably need it again and doing incremental parses is faster than re-parsing from scratch when it’s needed next).

To test that we are parsing incrementally on the next edit, we need to trigger that initial build of a syntax tree, which is what I’m doing here.

hamishknight · 2023-10-06T18:40:58Z

Sources/SourceKitLSP/Swift/SwiftLanguageServer.swift

-    // FIXME: (async) We might not have computed the syntax tree yet. Wait until we have a syntax tree.
-    // Really, getting the syntax tree should be an async operation.
-    while snapshot.tokens.syntaxTree == nil {
-      try? await Task.sleep(nanoseconds: 1_000_000)
-      if let newSnapshot = documentManager.latestSnapshot(uri) {
-        snapshot = newSnapshot
-      } else {


hamishknight · 2023-10-06T19:50:07Z

Sources/SourceKitLSP/Swift/SwiftLanguageServer.swift

      if let skTokens: SKDResponseArray = response[keys.annotations] {
        let tokenParser = SyntaxHighlightingTokenParser(sourcekitd: sourcekitd)
        var tokens: [SyntaxHighlightingToken] = []
        tokenParser.parseTokens(skTokens, in: snapshot, into: &tokens)

-        docTokens.semantic = tokens
+        return tokens
      }

-      return docTokens
+      return nil


Should this be turned into a guard?

hamishknight · 2023-10-06T20:40:06Z

Sources/SourceKitLSP/Swift/SyntaxTreeManager.swift

+  /// A task that parses a SwiftSyntax tree from a source file, producing both
+  /// the syntax tree and the lookahead ranges that are needed for a subsequent
+  /// incremental parse.
+  private typealias SyntaxTreeComputation = Task<(tree: SourceFileSyntax, lookaheadRanges: LookaheadRanges), Never>


Not really a comment on this PR, but is there a reason Parser.parseIncrementally returns a tuple instead of a dedicated struct? Seems like a single type would be easier to manage, and would help ensure you don't pass a mismatching syntax tree + lookahead ranges to IncrementalParseTransition (and would make it trivial to add any additional state later down the road if needed).

Yeah, I thought the same when using the API. We should probably change that

Filed swiftlang/swift-syntax#2267

hamishknight · 2023-10-06T20:54:05Z

Sources/SourceKitLSP/Swift/SemanticTokensManager.swift

+
+  /// The semantic tokens for the given snapshot or `nil` if no semantic tokens
+  /// have been computed yet.
+  func semanticTokens(for snapshotID: DocumentSnapshot.ID) -> [SyntaxHighlightingToken]? {


Personally I would prefer to wrap [SyntaxHighlightingToken] in its own struct, e.g then you could put members like lspEncoded and mergingTokens on it, and maybe move the edit processing logic onto it too. I don't feel that strongly about it though.

Good idea. It’s a slightly bigger change and I’ll do it in a follow-up PR.

…es for documents This allows us to use Swift concurrency to await the computation of the SwiftSyntax tree for a given document instead of having to poll for its creation.

No code changes, just moving code around because `mergedAndSortedTokens` no longer belonged in DocumentTokens.swift

Storing the semantic tokens inside `Document` was an anti-pattern because the semantic tokens only applied to Swift and were also being updated while the document contents themselves stayed constant. Instead, we should store the semantic tokens in a separate `SemanticTokensManager` that only exists in the `SwiftLanguageServer` and has the sole responsibility of tracking semantic tokens.

ahoppen · 2023-10-06T21:27:57Z

Is Document itself not really just a Swift thing anyway? I assume open/edit/close are all just forwarded directly to Clang, so is it even used there? Or should Document/DocumentManager/etc just be Swift-only and then we handle all the things in there rather than the separate syntax/semantic managers?

We still need to keep track of the document contents for clang so that we can re-open the documents in clangd if clangd crashes (for which we need to know the current document contents).

ahoppen · 2023-10-06T21:28:28Z

@swift-ci Please test

ahoppen added 3 commits October 6, 2023 07:17

Make members of DocumentSnapshot immutable

818be5e

Make DocumentSnapshot not have a reference to its originating `Docu…

556a58a

…ment` The reference just isn’t needed and this makes everything clearer.

Introduce an ID for DocumentSnapshot

7ab3b04

ahoppen requested review from bnbarham and hamishknight October 6, 2023 15:33

ahoppen requested a review from benlangmuir as a code owner October 6, 2023 15:33

bnbarham reviewed Oct 6, 2023

View reviewed changes

hamishknight approved these changes Oct 6, 2023

View reviewed changes

ahoppen mentioned this pull request Oct 6, 2023

Change parseIncrementally to return a struct instead of a tuple swiftlang/swift-syntax#2267

Closed

ahoppen added 3 commits October 6, 2023 14:25

Introduce SyntaxTreeManager that keeps track of the SwiftSyntax tre…

88cd743

…es for documents This allows us to use Swift concurrency to await the computation of the SwiftSyntax tree for a given document instead of having to poll for its creation.

Move semantic highlighting into its own file

8837399

No code changes, just moving code around because `mergedAndSortedTokens` no longer belonged in DocumentTokens.swift

ahoppen force-pushed the ahoppen/syntax-tree-manager branch from d164203 to 2d44263 Compare October 6, 2023 21:25

Translate an if into a guard

c1771de

ahoppen merged commit c98c16e into swiftlang:main Oct 7, 2023

ahoppen deleted the ahoppen/syntax-tree-manager branch October 7, 2023 01:06

This was referenced Oct 9, 2023

Introduce SyntaxHighlightingTokens instead of [SyntaxHighlightingToken] #874

Closed

SemanticTokensManager should not have DocumentSnapshot.ID as key #887

Closed

gremlinflat mentioned this pull request Mar 25, 2024

Introduce SyntaxHighlightingTokens instead of [SyntaxHighlightingToken] #1146

Merged

		/// Keeps track of SwiftSyntax trees for document snapshots and computes the
		/// SwiftSyntax trees on demand.

Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the Document #857

Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the Document #857

Uh oh!

Conversation

ahoppen commented Oct 6, 2023

Uh oh!

ahoppen commented Oct 6, 2023

Uh oh!

bnbarham left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Oct 6, 2023

Uh oh!

ahoppen commented Oct 6, 2023

Uh oh!

Uh oh!

Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the `Document` #857

Store syntax trees in semantic token in syntax tree/semantic token managers instead of in the `Document` #857

bnbarham left a comment •

edited

Loading