Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slices: new package to provide generic slice functions #45955

Closed
ianlancetaylor opened this issue May 5, 2021 · 285 comments
Closed

slices: new package to provide generic slice functions #45955

ianlancetaylor opened this issue May 5, 2021 · 285 comments

Comments

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented May 5, 2021

Note: Discussion is now at #47203.

This proposal is for use with #43651. We propose defining a new package, slices, that will provide functions that may be used with slices of any type. If this proposal is accepted, the new package will be included with the first release of Go that implements #43651 (we currently expect that that will be Go 1.18).

This description below is focused on the API, not the implementation. In general the implementation will be straightforward. The description has been updated from the original proposal several times based on the discussion.

// Package slices defines various functions useful with slices of any type.
// Unless otherwise specified, these functions all apply to the elements
// of a slice at index 0 <= i < len(s).
package slices

import "constraints" // See #45458 

// Equal reports whether two slices are equal: the same length and all
// elements equal. If the lengths are different, Equal returns false.
// Otherwise, the elements are compared in index order, and the
// comparison stops at the first unequal pair.
// Floating point NaNs are not considered equal.
func Equal[T comparable](s1, s2 []T) bool

// EqualFunc reports whether two slices are equal using a comparison
// function on each pair of elements. If the lengths are different,
// EqualFunc returns false. Otherwise, the elements are compared in
// index order, and the comparison stops at the first index for which
// eq returns false.
func EqualFunc[T1, T2 any](s1 []T1, s2 []T2, eq func(T1, T2) bool) bool

// Compare compares the elements of s1 and s2.
// The elements are compared sequentially starting at index 0,
// until one element is not equal to the other. The result of comparing
// the first non-matching elements is the result of the comparison.
// If both slices are equal until one of them ends, the shorter slice is
// considered less than the longer one
// The result will be 0 if s1==s2, -1 if s1 < s2, and +1 if s1 > s2.
func Compare[T constraints.Ordered](s1, s2 []T) int

// CompareFunc is like Compare, but uses a comparison function
// on each pair of elements. The elements are compared in index order,
// and the comparisons stop after the first time cmp returns non-zero.
// The result will be the first non-zero result of cmp; if cmp always
// returns 0 the result is 0 if len(s1) == len(s2), -1 if len(s1) < len(s2),
// and +1 if len(s1) > len(s2).
func CompareFunc[T any](s1, s2 []T, cmp func(T, T) int) int

// Index returns the index of the first occurrence of v in s, or -1 if not present.
func Index[T comparable](s []T, v T) int

// IndexFunc returns the index into s of the first element
// satisfying f(c), or -1 if none do.
func IndexFunc[T any](s []T, f func(T) bool) int

// Contains reports whether v is present in s.
func Contains[T comparable](s []T, v T) bool

// Insert inserts the values v... into s at index i, returning the modified slice.
// In the returned slice r, r[i] == the first v.  Insert panics if i is out of range.
// This function is O(len(s) + len(v)).
func Insert[S constraints.Slice[T], T any](s S, i int, v ...T) S

// Delete removes the elements s[i:j] from s, returning the modified slice.
// Delete panics if s[i:j] is not a valid slice of s.
// Delete modifies the contents of the slice s; it does not create a new slice.
// Delete is O(len(s)-(j-i)), so if many items must be deleted, it is better to
// make a single call deleting them all together than to delete one at a time.
func Delete[S constraints.Slice[T], T any](s S, i, j int) S

// Clone returns a copy of the slice.
// The elements are copied using assignment, so this is a shallow clone.
func Clone[S constraints.Slice[T], T any](s S) S

// Compact replaces consecutive runs of equal elements with a single copy.
// This is like the uniq command found on Unix.
// Compact modifies the contents of the slice s; it does not create a new slice.
func Compact[S constraints.Slice[T], T comparable](s S) S

// CompactFunc is like Compact, but uses a comparison function.
func CompactFunc[S constraints.Slice[T], T any](s S, cmp func(T, T) bool) S

// Grow grows the slice's capacity, if necessary, to guarantee space for
// another n elements. After Grow(n), at least n elements can be appended
// to the slice without another allocation. If n is negative or too large to
// allocate the memory, Grow will panic.
func Grow[S constraints.Slice[T], T any](s S, n int) S

// Clip removes unused capacity from the slice, returning s[:len(s):len(s)].
func Clip[S constraints.Slice[T], T any](s S) S
@gopherbot gopherbot added this to the Proposal milestone May 5, 2021
@ianlancetaylor ianlancetaylor added the generics Issue is related to generics label May 5, 2021
@randall77

This comment has been minimized.

@cespare

This comment has been minimized.

@cespare
Copy link
Contributor

cespare commented May 5, 2021

In general I think a slices package will be necessary. Functions like Equal, Insert, and Remove will be especially welcome improvements over the current boilerplate.

I have two concerns about the list of functions here:

  1. Too much bloat to start with. I think we should err on the side of being too minimal and add new functions in future releases as they prove worthwhile. For instance, I think that CompareFunc will be rarely used and doesn't obviously pull its weight here.

  2. Too much emphasis on a functional programming style using higher-order functions. My experience with languages like JS and Java (and quite a few others which were more C-like in their initial incarnations and added a lot of higher-order function stuff later) is that there is an unfortunate dichotomy between writing for loops and using higher-order functions.

    Code which heavily uses the latter looks very different on the page from the former, impeding readability.

    The higher-order function style also tends to be much slower, so when writing a loop there's a decision burden every time between saving a line or two and writing fast code. (TIMTOWTDI is not really the Go ethos.) And programmers usually prefer concision, so the net result is often slow, HOF-heavy code.

    The HOF style of code is also a bit clunky with Go's current function literals; adding many more HOFs will create pressure to add new syntax (proposal: spec: lightweight anonymous function syntax #21498).

    In summary, I feel like this nudges the language in a non-Go-like direction. A thing that I really like about Go is that, for many tasks, I just write a plain, clear for loop which gets compiled to the obvious machine code. I don't look forward to a future where the prevailing style is to replace each for loop with a sufficiently clever slices.Reduce.

With these ideas in mind, I'd divide this list of functions into three groups.

Clearly worthwhile in a v1 of slices:

  • Equal
  • Compare
  • Index
  • Contains
  • LastIndex
  • Insert
  • InsertSlice
  • Remove
  • RemoveSlice
  • Resize

Borderline:

  • EqualFunc
  • ContainsFunc
  • Map
  • Filter
  • FilterInPlace

Not worth it:

  • CompareFunc
  • IndexFunc
  • LastIndexFunc
  • Reduce
  • MapInPlace

@ianlancetaylor

This comment has been minimized.

@ianlancetaylor

This comment has been minimized.

@mnasruul

This comment has been minimized.

@IceWreck
Copy link

IceWreck commented May 5, 2021

Why not container/slices and later if needed container/maps ?

Anyways this package would be a welcome addition because its a pain (boilerplate wise) to implement common data structures in Go compared to C++ and this would slightly reduce that boilerplate

@sfllaw
Copy link
Contributor

sfllaw commented May 5, 2021

Should this new package mention https://github.com/golang/go/wiki/SliceTricks in its documentation? Should SliceTricks document the new package? How does this wiki page fit into the revised ecosystem?

@sfllaw

This comment has been minimized.

@mvdan

This comment has been minimized.

@sfllaw

This comment has been minimized.

@fzipp

This comment has been minimized.

@komuw

This comment has been minimized.

@Merovius
Copy link
Contributor

Merovius commented May 5, 2021

One think to point out is that the Map and MapInPlace could be unified under MapAppend[D, S any](dst []D, src []S, f func(S) D), which appends to dst - Map would be equivalent to passing nil for dst and MapInPlace would be passing dst[:0]. Similar for Filter. Not saying we should, just that we could. And I find InPlace a slightly weird bikeshed color for Map, where D and S can be different, thus the mapping is never "in place".

Personally, I'm a bit concerned about Index and especially Contains. IMO they suggest that unordered slices are a suitable data structure for sets. Anecdotally, using x in y for a Python list was the single most common mistake I had to deduct points for [edit] in an algorithms and datastructure class. Really need to hire a copy-editor [/edit]

I'd be less concerned if there was the equivalent thing for sorted slices as well, so that people at least be made to think about it. Though that would probably live in sort.

@extrasalt
Copy link

Would flatten and flatMap also be included?

I can think of cases where a map would produce a slice of slices and that might need flattening

@DylanMeeus
Copy link

DylanMeeus commented May 5, 2021

I think this is a good idea, and it opens the door to including more functions by default in the future. Functions like these are going to either be provided by the standard library, or we'll end up with multiple "third party" libraries which do these things all in their own way, which will be used by a ton of projects anyway. Especially when people migrate from other languages (Java, C#, ..) they will look for such functions.

Providing this out of the box would be in-line with how Go provides other frequently used constructs natively (like handling json data).

@sluongng
Copy link

sluongng commented May 5, 2021

// Resize returns s with length c. If the length increases, the new trailing
// elements will be set to the zero value of T.  The capacity of the returned
// slice is not specified.
func Resize[T any](s []T, c int) []T

Not obvious what would be the behavior in case where len(s) > c. Would the returning slice be a truncated s, taking the first c elements or will s be returned as-is?

// Map turns a []T1 to a []T2 using a mapping function.
func Map[T1, T2 any](s []T1, f func(T1) T2) []T2

// Filter returns a new slice containing all elements e in s for which keep(e) is true.
func Filter[T any](s []T, keep func(T) bool) []T

// MapInPlace copies the elements in src to dst, using a mapping function
// to convert their values. This panics if len(dst) < len(src).
// (Or we could return min(len(dst), len(src)) if that seems better.)
func MapInPlace[type D, S](dst []D, src []S, f func(S) D)

// FilterInPlace modifies s to contain only elements for which keep(e) is true.
// It returns the modified slice.
func FilterInPlace[T any](s []T, keep func(T) bool) []T

I wonder it's might worth to provide some interface that would return a lazily-evaluated value here? As the chaining of Map and Filter is very common in Java Stream API.
So I wonder if we gona need a separate CL for something like Promise[T any].

// Insert inserts v into s at index i, returning the modified slice.
// In the returned slice r, r[i] == v.  This panics if !(i >= 0 && i <= len(s)).
// This function is O(len(s)).
func Insert[T any](s []T, i int, v T) []T

// InsertSlice inserts si into s at index i, returning the modified slice.
// In the returned slice r, r[i] == si[0] (unless si is empty).
// This panics if !(i >= 0 && i <= len(s)).
func InsertSlice[T any](s, si []T, i int) []T

// Remove removes the element at index i from s, returning the modified slice.
// This panics if !(i >= 0 && i < len(s)).  This function is O(len(s)).
// This modifies the contents of the slice s; it does not create a new slice.
func Remove[T any](s []T, i int) []T

// RemoveSlice removes j-i elements from s starting at index i, returning the modified slice.
// This can be thought of as a reverse slice expression: it removes a subslice.
// This panics if i < 0 || j < i || j > len(s).
// This modifies the contents of the slice s; it does not create a new slice.
func RemoveSlice[T any](s []T, i, j int) []T

I don't agree with usage of panic here.
Perhaps returning an err or there should be an Optional type/struct that can be either a value or an error.
Might be worth implementing these in a separate CL instead of the initial CL.

@sanggonlee
Copy link

These seem like they would make our lives slightly easier, but I thought one of the philosophies of Go was that operations not in O(1) runtime shouldn't hide their complexity? I thought that was the main reason there were no simple commonly used abstractions like map, filter, etc in Go.
Although I suppose the runtime of these functions are quite universally understood by most developers...

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

One slice operation I've found myself writing frequently is “return the same slice with its cap reduced to length”.

#38081 was declined partially on the grounds that it could be written more clearly as a generic function (#38081 (comment)). I think such a function belongs in the slices package.

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

I agree with @cespare that some of the functional APIs here seem premature, especially given the lack of a concise lambda.

I would add that functional languages tend to rely heavily on covariance, which Go lacks. That makes functions like Reduce and even Map much less useful than the corresponding functions in functional programming languages, especially when compared to the for loops we can already write. (I looked at Map in particular https://github.com/bcmills/go2go/blob/master/map.go2, but found it quite awkward compared to the typical functional map.)

On the other hand, Map and Reduce would be much more clearly useful if we had some way to express “assignable to” as a constraint, which I believe would be a coherent addition to the existing design. It's not obvious to me changing those functions to use such a constraint would be backward-compatible, so I think they should be omitted until we have more hands-on experience with generic functional programming in Go.

@empath-75
Copy link

empath-75 commented May 5, 2021

If you're going to do map/filter/reduce doesn't it make more sense to design a more generic interface first, and then implement it for slices? There are a lot more datastructures than slices that could benefit from such a thing.

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

I agree with @sluongng that the behavior of Resize seems unclear. I also don't really understand its purpose — can't we already resize a slice using the built-in slice operator?

I think a Clone method that returns a copy of a slice (up to its length) without accepting a length parameter — analogous to the bytes.Clone proposed in #45038 — would be much clearer for decreasing a length.

For increasing a length, I wonder if it would be clearer to provide a function that increases the capacity instead, analogous to (*bytes.Buffer).Grow:

// Grow returns s with capacity at least c.
// If cap(s) >= c already, Grow returns s as-is.
func Grow[T any](s []T, c int) []T

Then, increasing the length of the slice is trivial:

	s := slices.Grow(s, n)[:n]

That might also address @mdempsky's use case in proposal #24204 (see #24204 (comment)), since “allocate a new []T with capacity at least c” could be written as:

	s := slices.Grow([]T{}, c)

@earthboundkid
Copy link
Contributor

earthboundkid commented May 5, 2021

The code I was writing yesterday would have benefited from the existence of slices.Grow.

I am also strongly in favor of the Append idiom rather than InPlace. Append is already used throughout the standard library, whereas InPlace has no current uses.

@arroo
Copy link

arroo commented May 5, 2021

one thing I have found useful on more than one occasion from JS's Reduce implementation is including the index as an argument to the provided function.
so it would be:

func Reduce[T1, T2 any](s []T1, initializer T2, f func(T2, T1, int) T2) T2

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

The Slice variant of Insert seems like too much duplication to me. We have only one append in the language, not separate append variants for one element and multiple elements — why should Insert be any different?

If we make Insert analogous to append, that gives the signature:

func Insert[T any](s []T, i int, v ...T) []T

which seems like it handily covers the use-cases for both the proposed Insert and InsertSlice.

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

I think the signature for RemoveSlice may be surprising. I would intuitively expect it to accept a length instead of a second index.

That being the case, I think a function named RemoveN that explicitly accepts a length would be clearer — it's a bit more awkward for the “two indices” case, but it would be much more natural for many cases and also a lot less likely to be misread:

// RemoveN removes n elements from s starting at index i, returning the modified slice.
// This panics if i < 0 || n < 0 || i+n > len(s).
// This modifies the contents of the slice s; it does not create a new slice.
func RemoveN[T any][s []T, i, n int) []T

(The N suffix already has a precedent in the standard library in strings.SplitN.)

@eacp
Copy link

eacp commented Mar 22, 2022

It appears the generic Sort function uses the QuickSort algorithm, which is O(n^2).

Are there plans for moving to an O(nlogn) algorithm, like Merge Sort or IntroSort?

@DeedleFake
Copy link

DeedleFake commented Mar 22, 2022

Despite the function name, it actually uses a hybrid sort that looks pretty similar to introsort at a glance, though I'm not too familiar with introsort. And while quick sort is worst case O(n^2), the average case is better. It's true that merge sort is always O(n*log(n)), but it's very difficult, if not impossible, to do in-place, so it does a bunch of allocations and copies which can more than make up the difference performance-wise in practical usage. As such, it's rarely used outside of academic settings as far as I know.

@eacp
Copy link

eacp commented Mar 22, 2022

Despite the function name, it actually uses a hybrid sort that looks pretty similar to introsort at a glance, though I'm not too familiar with introsort. And while quick sort is worst case O(n^2), the average case is better. It's true that merge sort is always O(n*log(n)), but it's very difficult, if not impossible, to do in-place, so it does a bunch of allocations and copies which can more than make up the difference performance-wise in practical usage. As such, it's rarely used outside of academic settings as far as I know.

Just to confirm, is the time complexity of the generic sort still O(nlogn)?

@tmaxmax
Copy link

tmaxmax commented Mar 28, 2022

This may have been discussed before, but I'm wondering why the slices package doesn't include a Find/FindFunc helper? While finding by equality wouldn't make much sense (as you already have the exact same value), I often find myself looking for an element that satisfies a given predicate, like this:

var slice []T

i := slices.IndexFunc(slice, func(t T) bool {
  // a predicate
})
if i == -1 {
  // not found
  return
}

// found, do whatever
elem := slice[i]

I believe it would be useful to have a Find function that returns the first element satisfying the given predicate. It would have the following function signature:

func Find[E any](s []E, f func(E) bool) (E, bool)

The boolean value indicates whether the element was found or not.

@ianlancetaylor
Copy link
Contributor Author

@tmaxmax Want to write a proposal for that? See also #50340.

@earthboundkid
Copy link
Contributor

earthboundkid commented Mar 31, 2022

slices.Map has been left out (for now?), but I wonder if []T → []any comes up enough to be worth adding separately from slices.Map(s, func(t T) any { return t }). It's in the FAQ for example: https://go.dev/doc/faq#convert_slice_of_interface The signature could be slices.ToAny[T any](s []T) []any.

@eacp
Copy link

eacp commented Apr 1, 2022

slices.Map has been left out (for now?), but I wonder if []T → []any comes up enough to be worth adding separately from slices.Map(s, func(t T) any { return t }). It's in the FAQ for example: https://go.dev/doc/faq#convert_slice_of_interface The signature could be slices.ToAny[T any](s []T) []any.

I think the https://github.com/luraim/fun#map library has a good Map implementation.

@mrkschan
Copy link

slices.Map has been left out (for now?), but I wonder if []T → []any comes up enough to be worth adding separately from slices.Map(s, func(t T) any { return t }). It's in the FAQ for example: https://go.dev/doc/faq#convert_slice_of_interface The signature could be slices.ToAny[T any](s []T) []any.

A search of slices.Map on this repo yields Map/Reduce/Filter :)

func Map[T1, T2 any](s []T1, f func(T1) T2) []T2 {
r := make([]T2, len(s))
for i, v := range s {
r[i] = f(v)
}
return r
}
// Reduce reduces a []T1 to a single value using a reduction function.
func Reduce[T1, T2 any](s []T1, initializer T2, f func(T2, T1) T2) T2 {
r := initializer
for _, v := range s {
r = f(r, v)
}
return r
}
// Filter filters values from a slice using a filter function.
func Filter[T any](s []T, f func(T) bool) []T {
var r []T
for _, v := range s {
if f(v) {
r = append(r, v)
}
}
return r
}

@Merovius
Copy link
Contributor

I wonder if []T → []any comes up enough to be worth adding separately from slices.Map(s, func(t T) any { return t }).

I think this is a prime usecase for a generic iterator API. Because you ~never want to actually do that conversion, you want to pass a []T to a function currently using ...any (most likely a fmt-like function) which only reads from it.

If I'm wrong and there are significant use-cases beyond that, I would retract that concern. Otherwise, I'd prefer if we solved generic iteration for the stdlib and made it easy to use that instead.

@deefdragon
Copy link

I have had need of both map and filter functions a number of times in the last week. Its easy enough to write your own, but I would like to voice my support of adding map and filter to the slices package.

@DeedleFake
Copy link

DeedleFake commented Sep 6, 2022

Map and filter functions will be much, much more useful on more general iterators. I think that it's worth waiting for #54245 rather than adding only semi-useful functions to the slices package that are likely to just become deprecated later. In most cases, it's usually easy enough to just use a for loop. Map in particular will probably also need #49085 or else the ergonomics will be quite awkward.

That being said, this package is in exp at the moment. Maybe it is worth adding them temporarily before the package gets a backwards compatability guarantee to see if it's possible to make them work well in this context.

@earthboundkid
Copy link
Contributor

Compare #54768 (slices.DeleteFunc) which I think would be useful even without having iterator support.

@DeedleFake
Copy link

DeleteFunc() is useful because it's more efficient. It's technically the same effect as Filter() would be, but clearer that it's in-place. A slice-specific Map() could also be more efficient, but if efficiency is so important in a given case than a for loop is likely to be more efficient anyways. DeleteFunc() handles some trickiness for you, but Map() would probably not really do anything useful.

@earthboundkid
Copy link
Contributor

I think there's a lot to be said for just counting lines/tokens/characters saved when evaluating a potential abstraction:

	ns = slices.Map(ns, func(n int) int {
		return n * n
	})

3 lines, 22 tokens, 56 characters

	for i, n := range ns {
		ns[i] = n * n
	}

3 lines, 17 tokens, 41 characters

An in-place map just doesn't pay for itself. You do get a savings of characters when you compare to making a new map, but then the question of efficiency vs an iterator comes into play.

@Merovius
Copy link
Contributor

Merovius commented Sep 6, 2022

@carlmjohnson Not that I think these are useful metrics, but

strs := slices.Map(ints, strconv.Itoa)

1 line, 12 tokens, 38 characters

strs := make([]string, len(ints))
for i, n := range ns {
    strs[i] = strconv.Itoa(n)
}

4 lines, 32 tokens, 89 characters.

We can always cherry-pick examples that make our favorites look favorable.

[edit] I just realized that you specifically tried to talk about an in-place Map. In which case your example is a bit confusing, because why would that Map function return the slice? In any case. I agree that an in-place Map isn't as useful, generally.

@ianlancetaylor
Copy link
Contributor Author

I have opened a new proposal #57433 to move golang.org/x/exp/slices into the standard library. Closing this proposal as completed. Thanks for all the discussion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests