Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode/utf8: add AppendRune #47609

Closed
dsnet opened this issue Aug 9, 2021 · 6 comments
Closed

unicode/utf8: add AppendRune #47609

dsnet opened this issue Aug 9, 2021 · 6 comments

Comments

@dsnet
Copy link
Member

dsnet commented Aug 9, 2021

The existing utf8.EncodeRune is not user friendly.

Analyzing all usages in the module proxy, usages fall into approximately two patterns.

Pattern 1: encode and increment the total length

var n int
b := make([]byte, ...) // make with a length that is hopefully large enough
for _, r := range ... {
    n += utf8.EncodeRune(b[n:], r)
}
return b[:n]

This pattern starts with a buffer that is large enough and then calls utf8.EncodeRune to directly write into the buffer and then increment the total known length.

This pattern is dangerous as it is especially prone to panicking. It assumes that b[n:] is always large enough (most code do not check to make sure that at least 4B available). Even worse, this is a bug that rarely manifests and is unlikely to occur unless the unit test writes a large number of multi-byte runes.

Pattern 2: appending to a slice through intermediate array

b := make(]byte, 0, ...) // optionally provide some capacity 
for _, r := range ... {
    var arr [utf8.UTFMax]
    n := utf8.EncodeRune(arr[:], r)
    b = append(b, arr[:n]...)
}

This pattern is much safer than pattern 1 in that it never panics. However, it 1) incurs a performance penalty encoding into an intermediate array that is then appended to the primary buffer, and 2) requires 3 lines of code instead of just 1.

Prevalence

Between the two common patterns:

  • at least ~25% are pattern 1,
  • at least ~15% are pattern 2, and
  • the remaining ~60% seem to be mostly either pattern 1 or pattern 2, but unfortunately my simple pattern matcher failed to classify them.

Proposal

Since both pattern 1 and 2 are both ultimately concerned with appending into a slice, I propose the addition of:

// AppendRune appends the UTF-8 encoding of r into p.
func AppendRune(p []byte, r rune) []byte

The signature matches many other append-like APIs in the standard library (e.g., strconv.AppendFloat).

@gopherbot gopherbot added this to the Proposal milestone Aug 9, 2021
@rsc
Copy link
Contributor

rsc commented Aug 11, 2021

/cc @robpike

@rsc
Copy link
Contributor

rsc commented Aug 11, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@robpike
Copy link
Contributor

robpike commented Aug 11, 2021

This does seem reasonable. Thanks @dsnet for doing the scan of the libraries: that makes the case.

@rsc
Copy link
Contributor

rsc commented Aug 18, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Aug 25, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: unicode/utf8: add AppendRune unicode/utf8: add AppendRune Aug 25, 2021
@rsc rsc modified the milestones: Proposal, Backlog Aug 25, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/345571 mentions this issue: unicode/utf8: add AppendRune

@rsc rsc moved this to Accepted in Proposals Aug 10, 2022
@rsc rsc added this to Proposals Aug 10, 2022
@dmitshur dmitshur modified the milestones: Backlog, Go1.18 Aug 22, 2022
@rsc rsc removed this from Proposals Oct 19, 2022
@golang golang locked and limited conversation to collaborators Aug 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants