-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Some tweaks for backtick strings #45099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2252b7
to
1f968b1
Compare
// Template strings should be preserved as much as possible | ||
const backtickQuoteEscapedCharsRegExp = /[\\`]/g; | ||
// Template strings preserve simple LF newlines, still encode CRLF (or CR) | ||
const backtickQuoteEscapedCharsRegExp = /\r\n|[\\\`\u0000-\u001f\t\v\f\b\r\u2028\u2029\u0085]/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this change? I'm unclear on why we wouldn't want to preserve these. backtickQuoteEsacpedCharsRegExp
is only used when the quote character is CharacterCodes.backtick
, which mean's we're preserving the raw text of the template string as much as possible, so it seems like adding extra escapes would break anyone using the raw strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I'll see if I can add a test for this.
-
Re the reason, I listed two reasons in the commit log:
- Not escaping things like a TAB or a FF character can get mangled by editor style settings
- Not escaping a CRLF newline and instead having the template literal have a CRLF will be interpreted as a regular newline, and result in a string with just LF. (It's even more likely to get mangled by your editor, but even if not you'll get that problem.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not escaping things like a TAB or a FF character can get mangled by editor style settings
Yeah, I can see that being a possible issue, however I'm still not sure "<TAB>foo"
should become `\tfoo`
since it is still semantically different between the raw and cooked forms. There are only two ways to get to this code path: 1) Via a user-defined transformation, in which case we would want to preserve what was written, or 2) via one of our code fixes. If we want to normalize TAB or FF in the code fix, that's fine, but I'm concerned whether this would affect anyone using custom transforms. This is a small enough corner case its probably reasonable however, and we can address it if there's feedback.
- Not escaping a CRLF newline and instead having the template literal have a CRLF will be interpreted as a regular newline, and result in a string with just LF. (It's even more likely to get mangled by your editor, but even if not you'll get that problem.)
StringLiteral can't contain a raw CRLF unless its part of a LineContinuation, which becomes an empty code sequence, per NOTE 2 of https://tc39.es/ecma262/#sec-literals-string-literals and in https://tc39.es/ecma262/#sec-static-semantics-sv, so that wouldn't matter in a StringLiteral → TemplateLiteral conversion. The escape sequence \r\n
in a string literal should remain as the escape sequence \r\n
in the template literal, but we shouldn't just convert a raw CRLF into the escape sequence as that could impact custom transforms (which I could see taking a reliance on the CRLF->LF normalization of template literals).
So, -1 on converting CRLF into the escaped form, +1 on converting other characters (at least until someone raises an issue).
cddfe23
to
5c089c3
Compare
@rbuckton, I added a test. The only place I saw this code used is when printing types, so that's what the test is using. (And the use of CRLF is to stress that this was actually broken before this change.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% clear on how the test added tests the changes to the RegExp, since that test relies on an existing source file, from which we would pull the source text and wouldn't use backtickQuoteEscapedCharsRegExp
. You would need to add a unit test that exercises the NodeFactory
API to create and print a node with custom text
/rawText
to prove out this change.
// Template strings should be preserved as much as possible | ||
const backtickQuoteEscapedCharsRegExp = /[\\`]/g; | ||
// Template strings preserve simple LF newlines, still encode CRLF (or CR) | ||
const backtickQuoteEscapedCharsRegExp = /\r\n|[\\\`\u0000-\u001f\t\v\f\b\r\u2028\u2029\u0085]/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not escaping things like a TAB or a FF character can get mangled by editor style settings
Yeah, I can see that being a possible issue, however I'm still not sure "<TAB>foo"
should become `\tfoo`
since it is still semantically different between the raw and cooked forms. There are only two ways to get to this code path: 1) Via a user-defined transformation, in which case we would want to preserve what was written, or 2) via one of our code fixes. If we want to normalize TAB or FF in the code fix, that's fine, but I'm concerned whether this would affect anyone using custom transforms. This is a small enough corner case its probably reasonable however, and we can address it if there's feedback.
- Not escaping a CRLF newline and instead having the template literal have a CRLF will be interpreted as a regular newline, and result in a string with just LF. (It's even more likely to get mangled by your editor, but even if not you'll get that problem.)
StringLiteral can't contain a raw CRLF unless its part of a LineContinuation, which becomes an empty code sequence, per NOTE 2 of https://tc39.es/ecma262/#sec-literals-string-literals and in https://tc39.es/ecma262/#sec-static-semantics-sv, so that wouldn't matter in a StringLiteral → TemplateLiteral conversion. The escape sequence \r\n
in a string literal should remain as the escape sequence \r\n
in the template literal, but we shouldn't just convert a raw CRLF into the escape sequence as that could impact custom transforms (which I could see taking a reliance on the CRLF->LF normalization of template literals).
So, -1 on converting CRLF into the escaped form, +1 on converting other characters (at least until someone raises an issue).
I think that there's some confusion here. This is no longer related to the string literal code fix, since I made it hand out a In any case, if you run the same test on
If this is indeed the only way to exercise this code, then maybe escape plain LFs (as |
1. `getRawLiteral()`: barf if `currentSourceFile` is missing, since if it is, then the following `getSourceTextOfNodeFromSourceFile` will return a bogus `""`. 2. One `||` -> `??` change. 3. `backtickQuoteEscapedCharsRegExp`: escape the usual control characters except for a simple LF. This code does get used to generate backtick strings when `rawText` is not given, and not escaping things like TAB characters can get mangled by editor settings. Worse, not escaping a CRLF and putting it verbatim in sthe string source will interpret it as LF, so add a special case for escaping these as `\r\n`. Added test. Related to microsoft#44313 and microsoft#40625.
5c089c3
to
0e9b8f4
Compare
Do we not normalize CRLF in template literal types? If so, that seems like a big disparity with template literal expressions (but perhaps unavoidable?). |
How does this break the output? Do we not handle CRLF when parsing template literal types? |
I'm not sure what normalization you're talking about... The actual type has
The original type has
so the CRLF from the type leads to a declaration with just a LF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to add a breaking change notice for API consumers that only passing text
to one of the factory methods to create template literals will have different escaping rules, and they can supply their own rawText
if our escaping doesn't match what they need.
getRawLiteral()
: barf ifcurrentSourceFile
is missing, since ifit is, then the following
getSourceTextOfNodeFromSourceFile
willreturn a bogus
""
.One
||
->??
change.backtickQuoteEscapedCharsRegExp
: escape the usual controlcharacters except for a simple LF. This code does get used to
generate backtick strings when
rawText
is not given, and notescaping things like TAB characters can get mangled by editor
settings. Worse, not escaping a CRLF and putting it verbatim in sthe
string source will interpret it as LF, so add a special case for
escaping these as
\r\n
.Added test.
Related to #44313 and #40625.