Skip to content

Better escaping of String literals #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 3, 2018
Merged

Better escaping of String literals #126

merged 5 commits into from
Apr 3, 2018

Conversation

kevmoo
Copy link
Collaborator

@kevmoo kevmoo commented Apr 3, 2018

No description provided.

@kevmoo kevmoo requested a review from natebosch April 3, 2018 02:11
@kevmoo
Copy link
Collaborator Author

kevmoo commented Apr 3, 2018

CC @lrhn

if (value.contains('\n')) {
return "r'''\n$value'''";
}
value = value.replaceAll('\\', r'\\');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use r'\' for the pattern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for an old commit, I think...


/// A [RegExp] that matches whitespace characters that should be escaped.
final _escapeRegExp = new RegExp(
'[\\x00-\\x07\\x0E-\\x1F${_escapeMap.keys.map(_getHexLiteral).join()}]');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a little too clever for what is just: new RegExp(r'[\x00-\x1f\x7f]').
I can't promise that it generates as efficient a RegExp (it might, if we recognize that it actually is a contiguous range, but I can't promise it).

Copy link

@lrhn lrhn Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, you extended the map in a later change, then it does make sense. I'd still probably start with just `\x00-\x1f' and not mind the overlap.

return value;
}

canBeRaw = false;
var mapped = _escapeMap[match[0]];
if (mapped != null) return mapped;
return _getHexLiteral(match[0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those three lines could be just:

return _escapeMap[value] ?? _getHexLiteral(value);

I'd even do:

return _escapeMap.putIfAbsent(value, _getHexLiteral);

(and make the map not constant, then you will cache the hex literals the first time you compute them instead of creating a new string each time).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo...not a bad idea...


// The only safe way to wrap the content is to escape all of the
// problematic characters - `$`, `'`, and `"`
var string = value.replaceAll(new RegExp(r"""(?=[$'"])"""), r'\');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache the regexp in a final variable, like for _escapeRegExp.

} else if (value == '"') {
hasDoubleQuote = true;
return value;
} else if (value == r'$') {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could treat \ the same as $ wrt. "canBeRaw", a string with content "$\ can be raw as r'"$\'.

I can see that it doesn't work with the way this operation currently works - you can't decide later that the string can't be raw, because then you have to go back and fix previous \ characters, so I guess this is a good enough trade-off since backslashes are rare.

The alternative is to scan the string once first to figure out what it contains, and then fix it up in one second operation.

}
}

// The only safe way to wrap the content is to escape all of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take offense with the word "only" :)
You could potentially use tripple-quotes in some cases (you just don't have the information here to know when).

@kevmoo kevmoo merged commit daff97d into master Apr 3, 2018
@kevmoo kevmoo deleted the string_literals branch April 3, 2018 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants