Description
See the discussion here. The salient conclusion is this:
Escapes continue to work the way they do now: \x
always inserts a single byte and \u
always inserts a sequence of bytes encoding a unicode character. Literals are turned into String objects according to the following simple check:
ASCIIString
if all bytes are < 0x80;UTF8String
if any bytes are ≥ 0x80.
If you want to use \x
escapes with values at or above 0x80 to generate invalid UTF-8, that's your business. We can also introduce an Latin1"..."
form that uses the Latin-1 encoding to store code points up to U+FF in an efficient character-per-byte form. Finally, the b"..."
macro-defined string form can let you use characters and escapes (both \x
and \u
) to generate byte arrays.
We can safely and quickly concatenate ASCIIString
s with each other, with UTF8String
s, or with Latin1String
s. Mixing UTF8String
s and Latin1String
s, however, requires transcoding the Latin1String
s to UTF-8. This, however, will not occur with string literals since they will always be ASCIIString
s or UTF8String
s.