|
| 1 | +- Feature Name: group_by |
| 2 | +- Start Date: 2018-06-15 |
| 3 | +- RFC PR: |
| 4 | +- Rust Issue: |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Provide an `Iterator` over a slice that produce non-overlapping runs of elements separated by a given predicate. |
| 10 | + |
| 11 | +# Motivation |
| 12 | +[motivation]: #motivation |
| 13 | + |
| 14 | +Adding this `Iterator` to the standard library will help people split slices by using a custom predicate! |
| 15 | +This `Iterator` is implemented on generic slices to provide performances and flexibility, `GroupBy` implements `DoubleEndedIterator` without any overhead and it does not need any allocation. |
| 16 | + |
| 17 | +There is a similar method that already exists in [the standard library called `split`](https://doc.rust-lang.org/std/primitive.slice.html#method.split) but it will remove the element that does the separation. |
| 18 | +This behavior is not always wanted and could have been achieved by using `group_by` skipping the first element of each groups but the first. |
| 19 | + |
| 20 | +In short it is a more generic `split` method that cover more use cases. |
| 21 | + |
| 22 | +Here is a loop that return the first element of each group based on the equality predicate: |
| 23 | + |
| 24 | +```rust |
| 25 | +let mut previous = None; |
| 26 | +let mut iter = slice.iter(); |
| 27 | +while let Some(elem) = iter.next() { |
| 28 | + if previous.is_none() || previous != Some(elem) { |
| 29 | + previous = Some(elem); |
| 30 | + |
| 31 | + // do something here with `elem`: the first element of each group |
| 32 | + } |
| 33 | +} |
| 34 | +``` |
| 35 | + |
| 36 | +Using the `GroupBy` `Iterator` here return all the elements which are in the same group, it gives a slice of a complete group with less boilerplate: |
| 37 | + |
| 38 | +```rust |
| 39 | +for group in slice.group_by(|a, b| a == b) { |
| 40 | + // do something here with the `group` slice |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +# Guide-level explanation |
| 45 | +[guide-level-explanation]: #guide-level-explanation |
| 46 | + |
| 47 | +If you want to split a slice into groups of elements you can use the `GroupBy` `Iterator`. It provides you the ability to specify if two elements that follow each other must be in the same group or not, if the predicate you specify returns `false` so the slice must be split at this point and a new group is returned to the user. A group is no more than a slice of the base slice. |
| 48 | + |
| 49 | +```rust |
| 50 | +struct Human { |
| 51 | + age: u32, |
| 52 | + is_cool: bool, |
| 53 | +} |
| 54 | + |
| 55 | +let slice = /* a slice of humans */; |
| 56 | + |
| 57 | +// we first group humans by coolness |
| 58 | +for coolness_group in slice.group_by(|a, b| a.is_cool == b.is_cool) { |
| 59 | + // and we then group humans by age |
| 60 | + for age_group in coolness_group.group_by(|a, b| a.age == b.age) { |
| 61 | + // ... |
| 62 | + } |
| 63 | +} |
| 64 | +``` |
| 65 | + |
| 66 | +# Reference-level explanation |
| 67 | +[reference-level-explanation]: #reference-level-explanation |
| 68 | + |
| 69 | +[A basic implementation is available](http://github.com/Kerollmops/group-by). Note that it implement `DoubleEndedIterator` and so the `next_back` and the `rev` methods. |
| 70 | + |
| 71 | +The implementation that is specified here is only available on slices, the reason is because it is less efficient to do that on any possible `Iterator`, much less optimizations are available to us with simple `Iterator`. It will probably be painful to implement `DoubleEndedIterator` on it. |
| 72 | + |
| 73 | +# Drawbacks |
| 74 | +[drawbacks]: #drawbacks |
| 75 | + |
| 76 | +It will add a new type to the slice and it will make the standard library grow. |
| 77 | + |
| 78 | +# Rationale and alternatives |
| 79 | +[alternatives]: #alternatives |
| 80 | + |
| 81 | +The current design will make no real overhead compared to one based only on generic `Iterator`s, it does not need allocation at all. The `GroupBy` `Iterator` will have a friend named `GrouByMut` and both will provide a `remainder` method ([following the same borrowing rules has the `ExactChunks/ExactChunksMut`](https://github.com/rust-lang/rust/pull/51339)) that will give the remaining elements. |
| 82 | + |
| 83 | +[The generic implementation on `Iterator` has been tested](https://git.phaazon.net/phaazon/group-by-rs/src/commit/3d3c6d80c02f1813ecc001b110a90392899d0f68) and performances are not here compared to the slice based one. |
| 84 | + |
| 85 | +# Prior art |
| 86 | +[prior-art]: #prior-art |
| 87 | + |
| 88 | +This is a useful function that is already present in most of the other language libraries (e.g. [Haskell has `groupBy`](http://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List.html#v:groupBy]). |
| 89 | + |
| 90 | +The good thing that Haskell provide in relation with the `groupBy` function is a `group` function for elements that implement `Eq`. The same behavior can be achieved: |
| 91 | + |
| 92 | +```rust |
| 93 | +fn group_by_eq<T: Eq>(slice: &[T]) -> impl Iterator<Item=&[T]> { |
| 94 | + GrouBy::new(slice, PartialEq::eq) |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +# Unresolved questions |
| 99 | +[unresolved]: #unresolved-questions |
| 100 | + |
| 101 | +In the standard library, when two implementation are near the same, macros are used to remove code duplication, we will need to declare a macro for `GroupBy` and `GroupByMut` that will be generic over the pointer type used (e.g. `*const T` and `*mut T`). |
0 commit comments