Skip to content

verilog: support newline and hex escapes in string literals #5192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

garytwong
Copy link
Contributor

What are the reasons/motivation for this change?
To add support for multi-line strings (i.e., those containing a backslash-newline), and \x hex escapes to string literals. This should bring string literal handling into conformance with 1800-2012 sec 5.9 and 5.9.1.

Explain how this is achieved.
By modifying verilog_lexer.l to match newline characters within strings, and detecting escape sequences within the literal as appropriate.

If applicable, please suggest to reviewers how they can test the change.
make test; see tests/verilog/multiline_strings.ys and tests/verilog/hex_escape.ys.

Handle newline characters in string literals according to
IEEE 1800-2012, sec 5.9:

 - a backslash/newline pair is accepted and ignored;
 - a bare newline (without backslash) is invalid; issue a warning and
   continue.

Add corresponding regression test to check that a string with an
escaped newline produces the correct result, and that a string
with an unescaped newline causes the appropriate warning.
Handle hex escapes ("\x..") in string literals according to
IEEE 1800-2012, sec 5.9.1.

Add minimal unit test to verify one hex escape sequence is
interpreted correctly.
@garytwong garytwong requested a review from zachjs as a code owner June 21, 2025 15:48
@KrystalDelusion KrystalDelusion self-requested a review June 22, 2025 22:00
@KrystalDelusion
Copy link
Member

This should bring string literal handling into conformance with 1800-2012 sec 5.9 and 5.9.1.

It would be good to include tests for the other special characters (and replace bug5160.v). Do you have any intention of implementing triple quoted strings?

From reading the standard it seems like the hex and the existing octal parsing is incomplete.

It shall be illegal for a hex_digit in an escape sequence to be an x_digit or a z_digit

So the parsing should recognise X, x, Z, z, and ? as being a part of the number, and then reject it as illegal. Curently "\xx1" is parsed as 24'011110000111100000110001, i.e. "xx1", completely ignoring the escape character. The zero length hex "\x" is similarly parsed as "x", though it should be rejected as illegal. Interestingly, while Verilator accepts 1-2 digit octal literals, it rejects \x unless it is followed by 2 hex digits. Though Verilator does struggle with octal digits, treating \121, \119, and \99 all as the same value (a "Q"); read_verilog on the other hand recognises the 9 as a non-octal digit and instead treats it as the character "9", which runs into the same issue of accepting a zero length escape such that "\9" is treated as "9".

I will however note that Verific appears to handle all of the cases I mentioned the same as read_verilog (including erroring on triple quoted strings), so it's probably not necessary to do anything about it...

@garytwong
Copy link
Contributor Author

It would be good to include tests for the other special characters (and replace bug5160.v).

I agree; I'll do that.

Do you have any intention of implementing triple quoted strings?

I hadn't thought about it... I understand that triple quoted strings were introduced in 1800-2023, but I have access only to 1800-2012. If somebody can send me the 2023 spec (or a summary of it with enough detail to implement triple quoted strings), I'll be happy to investigate further.

From reading the standard it seems like the hex and the existing octal parsing is incomplete.

It shall be illegal for a hex_digit in an escape sequence to be an x_digit or a z_digit

So the parsing should recognise X, x, Z, z, and ? as being a part of the number, and then reject it as illegal.

Good point. I'll add logic and test cases to detect illegal escape sequences.

@KrystalDelusion
Copy link
Member

I hadn't thought about it... I understand that triple quoted strings were introduced in 1800-2023

Ah no problem, I forgot I was looking at the latest standard instead of 2012, because IEEE only provides free downloads for the latest standard (https://ieeexplore.ieee.org/document/10458102, you'll need to make an account if you don't have one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants