Description
As the VS Code team is considering integrating tree-sitter in the UI thread (#161479 and more recently #207416) to potentially improve tokenization (#77140) and syntax highlighting (#50140) I'd like to suggest that indentation is another area that could be drastically improved by tree-sitter.
Currently language extensions have these possibilities when it comes to defining indentation, all with their drawbacks:
-
Define
indentationRules
andonEnterRules
regexps. Because of the limitations of regexps to match programming language constructs this produces surprising indentation in many edge cases that are often not possible to fix in full generality. The python extension for instance has struggled with these quite a bit: Improve auto-indentation behaviour vscode-python#481 -
Override the Enter key to bypass the regexp-based indentation, e.g.:
https://marketplace.visualstudio.com/items?itemName=KevinRose.vsc-python-indentThe extension is then able to maintain knowledge of the code structure and implement an ad hoc engine.
This works but only when indenting after Enter is pressed, the indentation command is not called when moving lines up and down or pasting with autoindent. Furthermore, if the extension host is swamped there might be a noticeable delay between pressing enter and inserting the newline and indent in the editor.
-
Correct the indentation with a format-on-type provider for
\n
. This is the approach taken for python, with pylance now providing indentation corrections via format-on-type: Implement format on type for proper indentation pylance-release#1613.I call these "corrections" because the indentation rules of the renderer thread are applied first and are immediately visible. The format-on-type edits are applied second and noticeable to the user if the extension host is slow to respond. That's not too bad but it's also possible for the format request to be cancelled/ignored if the user types too fast and invalidates the document version sent to the provider.
Indenting in both the renderer process and in edit corrections via format-on-type is a good compromise overall but it requires non-trivial analysis and behaviour implemented in extension code. Allowing extensions to provide smarter indentation rules aware of the tree structure of code (and with provisions for vertical alignment, see #66235) would make the initial indentation in the UI thread much more accurate in general.
That's where tree-sitter comes in. Both Emacs and nvim provide an indentation engine specified by declarative tree-sitter queries:
-
Emacs tutorial: https://www.masteringemacs.org/article/lets-write-a-treesitter-major-mode
Documentation: https://www.gnu.org/software/emacs/manual/html_node/elisp/Parser_002dbased-Indentation.html -
nvim documentation: https://neovim.io/doc/user/treesitter.html#treesitter-query
https://github.com/nvim-treesitter/nvim-treesitter/blob/aa8d8bc600e00f84d11b9d40c6900d72d0f68fa3/doc/nvim-treesitter.txt#L210
C example https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/c/indents.scm
Similarly, a tree-sitter declarative language for indentation in VS Code could solve the issue of complex tree-based indentation running entirely in the main process without running extension code.