Use generic string states in Python lexer #1477
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Python allows for a variety of string literals (formatted, raw, unicode) as well as byte literals. In addition, strings can be delimited by
'
,"
,'''
and"""
. At present, the Python lexer contains multiple states to handle the supported combination. This approach is duplicative, error-prone and doesn't scale.This PR takes a different approach. A
StringRegister
class is added to the Python lexer that is used to hold the stack of string literals currently being lexed. Using this approach, it is possible to implement a series of generic string states and apply the appropriate tokens with reference to this register.This PR fixes #937 and fixes #942 (or that's the goal at least).