Description
Did you check existing issues?
- I have read all the tree-sitter docs if it relates to using the parser
- I have searched the existing issues of tree-sitter-python
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version
)
tree-sitter 0.22.3
Describe the bug
Using old_tree.changed_ranges(new_tree)
Python parser does not detect removal or insertion of node escape_sequence
when switching between plain string and r-prefixed-string.
Toggling of prefix r
for the string results in a change of node string_start
, but while string_content
, parent of escape_sequence
, has no changes in content, its structure changes when escape_sequence
is detected/ignored.
Note that it seems that the equivalent changes to f-prefixed-string are detected as expected.
P.S. Sorry for example written in Python, but I don't know C/CLI scripts to reproduce the bug. Toggle commented/uncommented strings to switch between r-string and f-string.
Steps To Reproduce/Bad Parse Tree
- Create text file with a string containing escape sequence:
"for whom the \x07 {'tolls'}"
. - Parse it to get tree
A
:(module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end))))
. - Edit string by adding prefix
r
:r"for whom the \x07 {'tolls'}"
. - Parse it to get tree
B
:(module (expression_statement (string (string_start) (string_content) (string_end))))
. - Call
A.changed_ranges(B)
, and receive this output:[<Range ... start_byte=0, end_byte=1>]
. - Edit string by removing prefix
r
:"for whom the \x07 {'tolls'}"
. - Parse it to get tree
C
:(module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end))))
. - Call
B.changed_ranges(C)
, and receive this output:[]
.
Expected Behavior/Parse Tree
A.changed_ranges(B)
should have resulted in this output: [<Range ... start_byte=0, end_byte=1>, <Range ... start_byte=15, end_byte=19>]
.
B.changed_ranges(C)
should have resulted in this output (indexes are approximate and should have spanned same range as escape sequence): [<Range ... start_byte=14, end_byte=18>]
.
Repro
from tree_sitter import Language, Parser
import tree_sitter_python
def make_byte_feeder(src):
def feeder(pos, point):
b = src[pos:pos+1]
print(b.decode('utf-8'), end='')
return b
return feeder
# Empty `text` implies removal of selection.
# Non-empty `text` with `selection_start == selection_end` implies insertion.
# Non-empty `text` with `selection_start != selection_end` implies replacement.
def edit_tree(tree, src, selection_start, selection_end, text):
new_src = src[:selection_start] + text + src[selection_end:]
print('<'*10)
tree.edit(
start_byte=selection_start,
old_end_byte=selection_end,
new_end_byte=selection_start + len(text),
start_point=(0, 0),
old_end_point=(0, 0),
new_end_point=(0, 0),
)
new_tree = parser.parse(make_byte_feeder(new_src), tree)
print()
print('>'*10)
print('org:', src)
print('alt:', new_src, end='\n\n')
print('org root node:', tree.root_node)
print('alt root node:', new_tree.root_node, end='\n\n')
print('changes:', tree.changed_ranges(new_tree))
return new_tree, new_src
src = r'''"for whom the \x07 {'tolls'}"'''.encode('utf-8')
parser = Parser(Language(tree_sitter_python.language()))
print('<'*10)
tree = parser.parse(make_byte_feeder(src))
print()
print('>'*10)
# TEST R-STRING.
old_tree = tree
tree, src = edit_tree(tree, src, 0, 0, 'r'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)
old_tree = tree
tree, src = edit_tree(tree, src, 17, 19, '10'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)
old_tree = tree
tree, src = edit_tree(tree, src, 0, 1, b'')
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)
# TEST F-STRING.
# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 0, 'f'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)
# old_tree = tree
# tree, src = edit_tree(tree, src, 22, 27, 'rings'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)
# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 1, b'')
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)
Activity