Description
Support for HTML entities was requested in #10, and was mostly addressed in #50, but I think it's reasonable to want entities to be recognized inside of attribute values as well.
This is a trickier request because attribute_value
is currently a simple node that does not have any children and doesn't envision being broken up by tokens with special meanings. An entity is roughly equivalent to an escape_sequence
node in other tree-sitter parsers, but those parsers tend to represent a string's contents as a series of string_content
and escape_sequence
nodes.
So the most intuitive solution might be to introduce a string_content
node (or attribute_value_content
or something), and make it so that attribute_value
's children are some combination of string_content
and entity
nodes. By and large I think it wouldn't disrupt existing consumers of tree-sitter-html
.
<abbr title="American Telephone & Telegraph">AT&T</p>
(fragment
(element
(start_tag
(tag_name)
(attribute
(attribute_name)
(quoted_attribute_value
(attribute_value
(string_content)
(entity)
(string_content)))))
(text)
(entity)
(text)
(end_tag
(tag_name))))
The only exception I can think of is injections — since injection.include-children
is false
by default, anyone injecting into attribute_value
nodes would no longer see any content inside them until they change that setting.
Another option would be to do something like what tree-sitter-javascript
does for template strings: make it so that attribute_value
can contain entity
nodes, but don't represent the non-entity text content of attribute_value
with any sort of node. In this scenario, injections into attribute_value
would at least still see all the non-entity content of the value when include-children
is false
. This might be more surprising behavior because it runs contrary to how we handle entities in tag contents (entity
nodes break up text
nodes), but maybe folks might feel it's less disruptive.