-
-
Couldn't load subscription status.
- Fork 5.7k
Description
See the discussion here. The salient conclusion is this:
Escapes continue to work the way they do now: \x always inserts a single byte and \u always inserts a sequence of bytes encoding a unicode character. Literals are turned into String objects according to the following simple check:
ASCIIStringif all bytes are < 0x80;UTF8Stringif any bytes are ≥ 0x80.
If you want to use \x escapes with values at or above 0x80 to generate invalid UTF-8, that's your business. We can also introduce an Latin1"..." form that uses the Latin-1 encoding to store code points up to U+FF in an efficient character-per-byte form. Finally, the b"..." macro-defined string form can let you use characters and escapes (both \x and \u) to generate byte arrays.
We can safely and quickly concatenate ASCIIStrings with each other, with UTF8Strings, or with Latin1Strings. Mixing UTF8Strings and Latin1Strings, however, requires transcoding the Latin1Strings to UTF-8. This, however, will not occur with string literals since they will always be ASCIIStrings or UTF8Strings.