Skip to content

Latest commit

 

History

History
115 lines (83 loc) · 3.81 KB

CONTRIBUTING.md

File metadata and controls

115 lines (83 loc) · 3.81 KB

Contributing

Thanks for the interest in contributing to this project! Next you'll find some general explanation about the project and how to run it locally.

Tree-sitter

To get more familiar with tree-sitter itself and writing tree-sitter grammars, you may want to read https://tree-sitter.github.io/tree-sitter/creating-parsers.

The grammar

Most tree-sitter grammars are written using a single grammar.js file with a declarative-like syntax.

But reStructuredText isn't a programming language with a well defined specification, it has a lot of edge cases, and a text can have a different meaning depending on the context it is located or its indentation level.

Tree-sitter is flexible enough that it lets us write some rules in C (external scanner), so for the reason above, our grammar will make heavy use of this feature.

External scanner

Tree-sitter is a LR(k) parser, so we can't backtrack. Our external scanner must share some logic while recognizing some nodes. For example, if we find a * character, we first try to see if it's a list element, then an emphasis node, then a strong node, etc.

Most of the time when something isn't a recognizable node, it is interpreted as a simple text.

The external scanner also allow us to keep some state between each parsing of a node, this is currently used to keep track of the indentation levels.

Project structure

Most of the files on the repository are auto-generated by tree-sitter, they are needed for the grammar to be compiled easily on the user's computer, so they are committed in the repository.

Some of the files that aren't auto-generated are:

  • grammar.js: it defines all nodes that our grammar has and its structure.
  • src/scanner.c: the entry point to our custom scanner, to make it easier to maintain the code that isn't auto-generated is inside the src/tree_sitter_rst/ directory.
  • src/tree_sitter_rst/scanner.c: it contains functions used to create/serialize/de-serialize our custom scanner, and it also has the main entry point to our custom scanner: rst_scanner_scan (AKA, the big collection of ifs).
  • src/tree_sitter_rst/tokens.h: defines all tokens that our external scanner recognize, they are the same that are declared in the externals attribute in our grammar.js file.
  • src/tree_sitter_rst/chars.c: some utility functions to recognize characters, like numbers, bullets, letters, etc.
  • src/tree_sitter_rst/parser.c: here are all functions that match the current text being parsed to a valid token.
  • test/corpus/: tests for our grammar so we are sure nothing breaks when changing stuff, you can read about the syntax at https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test.
  • test/examples/: these are the files that docutils uses to run their tests, we parse then without checking the resulting CST, we only care if our parser errors in the process.
  • docs/: this directory is deployed to GitHub pages https://stsewd.dev/tree-sitter-rst/.

Developing

Requirements:

  • Node
  • A C compiler (clang is preferred)
  • Docker (only if you want to see your changes on the browser)

Install the requirements with:

npm install

To build the grammar:

npm run build

To run the tests:

npm run test

Note: if you changed the grammar, you need to re-build it for tests to use the new grammar.

Test the grammar by parsing a file:

npm run parse -- test.rst

Test the grammar on your browser:

npm run web

Note: if you changed the grammar, you need to rebuild it and run npm run wasm (requires docker).

Some times you may find useful to compare the output of docutils for a given RST document, since the reStructuredText specification doesn't contain/explain all edge cases.

pip install docutils
rst2html5.py test.rst out.html
xdg-open out.html