Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: make byte array types ordered #61004

Closed
pascaldekloe opened this issue Jun 26, 2023 · 20 comments
Closed

proposal: spec: make byte array types ordered #61004

pascaldekloe opened this issue Jun 26, 2023 · 20 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Milestone

Comments

@pascaldekloe
Copy link
Contributor

For two byte arrays with the same size, a and b, string(a[:]) < string(b[:]) works, yet a < b does not.

The logic for comparing byte arrays is fully intuitive.

Use-Case

B-tree implementations now need redundant code/types to make fixed-width keys work efficiently, i.e., type Map[K Orderable, V any] and type ArrayKeyMap[K Arrays, V any]. Interface overhead would kill the performance.

Workaround

Suboptimal instructions remain even with the risks of unsafe.String(&array[0], len(array)) as the size context gets lost.

History

Bytes only, rather than the full type range, prevents quite a few implementation issues.

Extra

Array inclusion may play nicely in the new cmp package. Generics does not support matching on any-size arrays yet, so it may start with a long list of ~[2]byte | ~[3]byte | … instead.

@gopherbot gopherbot added this to the Proposal milestone Jun 26, 2023
@apparentlymart
Copy link

The standard library already has bytes.Compare for comparing two []byte values lexically, which is compatible with the signature of cmp.Compare.

Would adding a new operator allow anything that can't already be achieved using that function with full-coverage slices of both arrays? Would the answer change if there were also a bytes.Less implemented as Compare(a, b) < 0, returning bool, thereby matching the signature of cmp.Less?

(I understand that these functions take byte slices rather than byte arrays. If that is the crucial difference that you're motivated by then I'd love to hear more about why that distinction is important!)

@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Jun 26, 2023
@ianlancetaylor
Copy link
Contributor

Looks like a dup of #39355.

@pascaldekloe
Copy link
Contributor Author

Looks like a dup of #39355.

A (very specific) subset, as the proposal seems held down by what about NaN or interface{} questions.

@pascaldekloe
Copy link
Contributor Author

“B-tree implementations now need redundant code/types to make fixed-width keys work efficiently, i.e., type Map[K Orderable, V any] and type ArrayKeyMap[K Arrays, V any].” @apparentlymart

@ianlancetaylor
Copy link
Contributor

If I understand that quote correctly, I don't agree with it. The answer is that a B-tree implementation should always use a comparison function. If you want to use a type that is already ordered, then the comparison function is cmp.Compare.

@pascaldekloe
Copy link
Contributor Author

The problem is that we want to use a type that is not already ordered @ianlancetaylor.
Proposal: spec: make byte array types ordered.

Is there any reason why this needs to be v2?

@ianlancetaylor
Copy link
Contributor

We generally mark all language changes as v2.

@ianlancetaylor
Copy link
Contributor

The problem is that we want to use a type that is not already ordered

Sorry, I made a mistake. I should have said that for a byte slice the comparison function you should use is bytes.Compare. For a byte array of known size, the comparison function could be something like

    func(a, b [N]byte) int { return bytes.Compare(a[:], b[:]) }

@pascaldekloe
Copy link
Contributor Author

I understand that we can make a comparison function for arrays, and another one for ordered types, yet we can't do one for both now, which results in duplicate code. The alternative is an interface, or a lambda. Both kill the performance.

@apparentlymart
Copy link

Sorry @pascaldekloe .... I did read the writeup several times but for some reason I had a mental block on the statement about the B-tree use-case.

If you have a specific example of some code that is currently inconvenient to write that would be improved by this proposal, it would help to include a source snippet in the proposal to help make the need clearer.

It sounds like you are trying to make something that is generic over cmp.Ordered, but that type set doesn't include all of the types you need. In other discussions I've seen folks propose including the comparison function as part of the collection itself, perhaps something like this:

type Map[K, V any] struct {
    less func (i, j K) bool
    // (etc)
}

func NewMapOrdered[K cmp.Ordered, V any]() Map[K, V] {
    return Map[K, V]{
        less: cmp.Less[K],
    }
}

func NewMapByteSlice[V any]() Map[[]byte, V] {
    return Map[K, V]{
        less: bytes.Less, // (assuming there were a bytes.Less function)
    }
}

However, I would worry that the indirect call overhead for calling the less function would be similar to the interface method call overhead you were worried about. I assume (but haven't checked) that the compiler would not be able to devirtualize that call.

That aside, I think the main question left in my mind is whether comparing byte arrays is common enough to warrant the additional language complexity of a weird exception for one particular array element type. Most Go programmers don't start by reading the specification, and so I wonder how they would discover that byte arrays in particular are ordered while no other array type is, and if they were to find that out via experimentation would they correctly deduce what the rule is? I think it's fair to say that most Go programmers aren't regularly writing b-tree implementations, and so it might help to explore what other use-cases this could potentially meet that might be more general.

@pascaldekloe
Copy link
Contributor Author

Pick a piece of code which compares generics @apparentlymart, say this map, and then try to make it work for both var perNumber btree.Map[int, string] and var perArray btree.Map[[5]byte, string]. Spoiler: you can not. You'll have to write another implementation just to support the array types. I can think of several more use-cases right now. The issue is clear without though.

@ianlancetaylor
Copy link
Contributor

The alternative is an interface, or a lambda. Both kill the performance.

I don't see why that would be. The inliner gets better with every release, and seems fully capable of handling the lambda case. If performance is the main argument in favor of this language change, then we should invest effort into improving performance.

@bcmills
Copy link
Contributor

bcmills commented Jun 27, 2023

Looks like a dup of #39355.

Note that #39355 was retracted, not declined — as far as I can tell, the discussion there never reached a final consensus.

@apparentlymart
Copy link

apparentlymart commented Jun 27, 2023

Out of curiosity I took the Map implementation linked above and adapted it to have a function pointer for comparison embedded inside it, similar to what I described in an earlier comment: Source code and assembly in Compiler Explorer.

I'm not familiar enough with this code to know where the hot paths are but from casual inspection of the search method it seems that the compiler has noticed that it can inline the lambda and has generated an inline CMPL instruction instead of a virtual call. Of course this overall program is pretty contrived -- I just copied one of the test cases into the main source file -- so I might be making the compiler's job too easy.

@pascaldekloe
Copy link
Contributor Author

I don't see why that would be. The inliner gets better with every release, and seems fully capable of handling the lambda case.

goos: darwin
goarch: arm64
pkg: golang.org/explain/ian
BenchmarkSlicesEqual/generics-8         	45447064	        26.03 ns/op
BenchmarkSlicesEqual/func-8             	34762455	        34.70 ns/op
PASS
ok  	golang.org/explain/ian	3.323s
var tokens = []string{"the", "quick", "brown", "fox", "jumps", "over", "the", "…"}

func BenchmarkSlicesEqual(b *testing.B) {
	same := make([]string, len(tokens))
	copy(same, tokens)

	b.Run("generics", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			slices.Equal(tokens, same)
		}
	})
	b.Run("func", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			slices.EqualFunc(tokens, same, func(a, b string) bool {
				return a == b
			})
		}
	})
}

Looking at the assembler is the way to go @apparentlymart. 👍

@ianlancetaylor
Copy link
Contributor

We aren't going to specialize the language for byte arrays (or byte slices; it's not immediately clear what this proposal is after). Any language extension we make here should apply to all arrays (or slices) with ordered element types.

If this is about slices, note that we've always decided that slices are not comparable, meaning that you can't use == with slice types. That is because == on a slice is ambiguous: it might mean that both slices have the same underlying array (and the same length), or it might mean that both slices have the same elements in the underlying array. Because of this ambiguity, we don't define == on slices. It would be very strange to define < on slice types but not ==. We aren't going to do that.

Generic containers should always use an explicit comparison function, so we don't need ordered arrays in order to use containers.

If it's important to make string(a) < string(b) fast, where a and b are byte slices, we could make the compiler recognize this case and implement it efficiently using bytes.Compare. I don't know if that would help or how often this comes up.

Therefore, this is a likely decline. Leaving open for three weeks for final comments.

@randall77
Copy link
Contributor

If it's important to make string(a) < string(b) fast, where a and b are byte slices, we could make the compiler recognize this case and implement it efficiently using bytes.Compare. I don't know if that would help or how often this comes up.

We do already do this. (Not by using bytes.Compare but by doing an copy-less cast to string and then using runtime.cmpstring.)

@pascaldekloe
Copy link
Contributor Author

Thanks for the clear expectation management Ian. That helps. 🙂

I have pointed out 3 reasons in this issue: (1) performance, (2) code duplication, and (3) consistency.

That is because == on a slice is ambiguous: it might mean that both slices have the same underlying array (and the same length), or it might mean that both slices have the same elements in the underlying array.

“ The logic for comparing byte arrays is fully intuitive.” The irony (3) is that a string conversion does compare—with an underlaying array. That slice reasoning is pretty lame anyway. Go can absolutely declare how it compares/orders slices and get it done so. Channels are comparable now, and their comparison is way more "ambiguous" than any slices version could.

Generic containers should always use an explicit comparison function

I have clearly demonstrated how they are too slow (1). More generally, all Func occurences in the core library should be used for convenience only. Big improvements on lambda speed is unlikely as the issue is well known, and the people working on the matter are not stupid.

If the Go team puts its priority somewhere else then that is OK of course. Just own it. All those tricks to delay, confuse and obscure are not doing anyone a favour.

@ianlancetaylor
Copy link
Contributor

An argument based on performance is only useful if we think that the performance can't be improved. We certainly think that the performance here can be improved and in fact there is an effort right now to improve the inliner.

@ianlancetaylor
Copy link
Contributor

No change in consensus.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2023
@golang golang locked and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

6 participants