GitHub

ts_utils (Experimental)

ts_utils provides a lightweight set of utilities to make working with tree_sitter more pythonic. It is aimed at users wanting to perform read-only analysis of programs using tree-sitter.

Warning: ts_utils is currently unstable and under active development, with parts of the API likely to change.

Getting Started

Installation

ts_utils can be installed directly from GitHub via:

pip install git+https://github.com/devjeetr/ts_utils

Parsing source into a tree

ts_utils.parsing provides utilities that automate management of language libraries to ease parsing of source code.

from ts_utils import parse, sexp

source = """
    def main():
        print("Hello, World!")
"""
# automatically downloads, caches and builds
# language library for 'python'
tree = parse(source, "python")
# You can also provide your own
# language library
tree = parse(source, language_library)

Investigating `node_types` of a language grammar

You can investigate node_types of a language as follows:

from ts_utils import get_node_types, get_supernode_mappings

node_types = get_node_types('python') # loads 'node_types.json' if
                                      # provided by language grammar.

Working with `tree_sitter` trees

ts_utils.iter provides itertool style utilities to iterate over nodes in a tree. Behind the scenes, ts_utils.iter uses efficient TreeCursor operations resulting in as close to bare-bones performance as possible.

from ts_utils.iter import iternodes, iternodes_with_parent

tree = parse(...)

for node in iternodes(tree.walk()):
    ...

All functions in ts_utils.iter take an optional argument traversal_fitler, which allows you to filter out nodes from the traversal. If a traversal_filter(node) == False, the entire subtree rooted at node is skipped.

only_named_nodes = lambda node: node.is_named
tree = parse(...)

for node in iternodes(tree, only_named_nodes):
    # skips over any node that is not named
    ...

Since ts_utils.iter provides pure functions to transform TreeCursors into iterators, their outputs can be arbitrarily composed with map, compose, reduce and itertools.*. This composition allows you to create

node_iter = iternodes(tree.walk())
def find(node_types: Set[str], tree: Tree):
    "Finds all nodes that are of a type specified in node_types"
    return filter(lambda node: node.type in node_types, iternodes(tree.walk()))

The traversal order of all functions in ts_utils.iter is deterministic for a given tree and traversal_filter, meaning that regardless of which function you use, nodes will be yielded in the same order. This allows you to simply wrap an iterator with enumerate to assign each node a unique id that will be consistent across multiple traversals/iterations.

only_named_nodes = lambda node: node.is_named
a = list(iternodes(tree.walk(), only_named_nodes))
b = list(node for node,parent in iternodes_with_parent(tree.walk(), only_named_nodes))

assert a == b # OK

Hashing nodes

ts_utils.hash_node can hash tree_sitter nodes, which by default, are not hashable.

Note: ts_utils.hash_node only works with nodes that do not contain errors (node.has_errors == False).

Converting trees to sparse adjacency matrices

ts_utils.matrix provides utilities to convert trees to sparse adjacency matrices.

matrix = ts_utils.matrix.parent_mask(...)
child_mask = matrix.transpose()

next_sibling_mask = ts_utils.matrix.next_sibling_mask(...)
prev_sibling_mask = ts_utils.matrix.prev_sibling_mask(...)

all_edges = parent_mask * child_mask * next_sibling_mask * prev_sibling_mask

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
docs		docs
notebooks		notebooks
tests		tests
ts_utils		ts_utils
.gitignore		.gitignore
.pdm.toml		.pdm.toml
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ts_utils (Experimental)

Getting Started

Installation

Parsing source into a tree

Investigating `node_types` of a language grammar

Working with `tree_sitter` trees

Hashing nodes

Converting trees to sparse adjacency matrices

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

devjeetr/ts_utils

Folders and files

Latest commit

History

Repository files navigation

ts_utils (Experimental)

Getting Started

Installation

Parsing source into a tree

Investigating node_types of a language grammar

Working with tree_sitter trees

Hashing nodes

Converting trees to sparse adjacency matrices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Investigating `node_types` of a language grammar

Working with `tree_sitter` trees

Packages