Skip to content

unicode/utf8: add AppendRune #47609

Closed
Closed
@dsnet

Description

@dsnet

The existing utf8.EncodeRune is not user friendly.

Analyzing all usages in the module proxy, usages fall into approximately two patterns.

Pattern 1: encode and increment the total length

var n int
b := make([]byte, ...) // make with a length that is hopefully large enough
for _, r := range ... {
    n += utf8.EncodeRune(b[n:], r)
}
return b[:n]

This pattern starts with a buffer that is large enough and then calls utf8.EncodeRune to directly write into the buffer and then increment the total known length.

This pattern is dangerous as it is especially prone to panicking. It assumes that b[n:] is always large enough (most code do not check to make sure that at least 4B available). Even worse, this is a bug that rarely manifests and is unlikely to occur unless the unit test writes a large number of multi-byte runes.

Pattern 2: appending to a slice through intermediate array

b := make(]byte, 0, ...) // optionally provide some capacity 
for _, r := range ... {
    var arr [utf8.UTFMax]
    n := utf8.EncodeRune(arr[:], r)
    b = append(b, arr[:n]...)
}

This pattern is much safer than pattern 1 in that it never panics. However, it 1) incurs a performance penalty encoding into an intermediate array that is then appended to the primary buffer, and 2) requires 3 lines of code instead of just 1.

Prevalence

Between the two common patterns:

  • at least ~25% are pattern 1,
  • at least ~15% are pattern 2, and
  • the remaining ~60% seem to be mostly either pattern 1 or pattern 2, but unfortunately my simple pattern matcher failed to classify them.

Proposal

Since both pattern 1 and 2 are both ultimately concerned with appending into a slice, I propose the addition of:

// AppendRune appends the UTF-8 encoding of r into p.
func AppendRune(p []byte, r rune) []byte

The signature matches many other append-like APIs in the standard library (e.g., strconv.AppendFloat).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions