Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix character escaping in Perl lexer #1549

Merged

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented Jun 19, 2020

In single-quoted strings, the Perl lexer currently tokenises \ as an error if it is followed by any character other than a '. This is a bug. Perl accepts the use of \ in single-quoted strings.

In the course of fixing this, errors in the handling of character escaping in double-quoted strings was also addressed. In that case, $ and @ are also valid characters to escape. The use of \ can still produce Error tokens in double-quoted strings if followed by another character.

This fixes #1163.

@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Jun 19, 2020
@pyrmont pyrmont self-assigned this Jun 19, 2020
@pyrmont
Copy link
Contributor Author

pyrmont commented Jun 19, 2020

@miparnisari I am very much out of my depth with Perl but tried to experiment with how it handles the \ in double-quoted strings by running perl -de1 at the command line and playing around a bit with print. If I run print "foo\bar", I get the output "fooar" which doesn't really make sense to me at all but makes me think that \ should be lexed as an error unless it's followed by a valid character to escape. What do you think?

@jeremychan
Copy link

Do you mean you get "foar"? \b is a backspace http://www.java2s.com/Code/Perl/String/BackslashEscapesinPerl.htm

@pyrmont
Copy link
Contributor Author

pyrmont commented Jul 4, 2020

@jeremychan Sorry, right you are. Thanks for the correction! I'll have a closer look at that list and see if I need to make any more changes :)

@pyrmont
Copy link
Contributor Author

pyrmont commented Jul 5, 2020

It seems surprisingly difficult to determine the valid escape sequences in a Perl string. Using a combination of the lists from the Perl regular expression documentation and the Encode::Escape CPAN module, I've added more robust escaping support.

@pyrmont pyrmont merged commit 9539764 into rouge-ruby:master Jul 5, 2020
@pyrmont pyrmont deleted the bugfix.perl-backslash-single-quote branch July 5, 2020 05:01
@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label Jul 5, 2020
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 19, 2021
In single- and double-quoted strings, the Perl lexer currently
tokenises `\` as an error if it is not followed by a character in part
of a recognised escape sequence. This is a bug. Perl accepts the use of
`\` in single- and double-quoted strings even if it is not part of a
valid escape sequence.

This commit permits the use of `\` in single-quoted and double-quoted
strings as well as increasing the range of escape sequences that are
recognised in double-quoted strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When inside single quotes, Perl tokenizes \ as an error
2 participants