Description
The existing utf8.EncodeRune
is not user friendly.
Analyzing all usages in the module proxy, usages fall into approximately two patterns.
Pattern 1: encode and increment the total length
var n int
b := make([]byte, ...) // make with a length that is hopefully large enough
for _, r := range ... {
n += utf8.EncodeRune(b[n:], r)
}
return b[:n]
This pattern starts with a buffer that is large enough and then calls utf8.EncodeRune
to directly write into the buffer and then increment the total known length.
This pattern is dangerous as it is especially prone to panicking. It assumes that b[n:]
is always large enough (most code do not check to make sure that at least 4B available). Even worse, this is a bug that rarely manifests and is unlikely to occur unless the unit test writes a large number of multi-byte runes.
Pattern 2: appending to a slice through intermediate array
b := make(]byte, 0, ...) // optionally provide some capacity
for _, r := range ... {
var arr [utf8.UTFMax]
n := utf8.EncodeRune(arr[:], r)
b = append(b, arr[:n]...)
}
This pattern is much safer than pattern 1 in that it never panics. However, it 1) incurs a performance penalty encoding into an intermediate array that is then appended to the primary buffer, and 2) requires 3 lines of code instead of just 1.
Prevalence
Between the two common patterns:
- at least ~25% are pattern 1,
- at least ~15% are pattern 2, and
- the remaining ~60% seem to be mostly either pattern 1 or pattern 2, but unfortunately my simple pattern matcher failed to classify them.
Proposal
Since both pattern 1 and 2 are both ultimately concerned with appending into a slice, I propose the addition of:
// AppendRune appends the UTF-8 encoding of r into p.
func AppendRune(p []byte, r rune) []byte
The signature matches many other append-like APIs in the standard library (e.g., strconv.AppendFloat
).