Skip to content

Function for generating RLE-like groups #686

@arunsrinivasan

Description

@arunsrinivasan

I've seen this scenario come across quite a few times on SO:

require(data.table)
set.seed(2L)
DT <- data.table(x=sample(3,10,TRUE), y=1:10)
#     x  y
#  1: 1  1
#  2: 3  2
#  3: 2  3
#  4: 1  4
#  5: 3  5
#  6: 3  6
#  7: 1  7
#  8: 3  8
#  9: 2  9
#10: 2 10

Now add a column z, based on column x, that starts from 1 and retains the same value (or group) as long as the successive values are the same. That is, in this case, z is:

z <- as.integer(c(1, 2, 3, 4, 5, 5, 6, 7, 8, 8))
#  [1] 1 2 3 4 5 5 6 7 8 8

This can be accomplished quite easily with data.table's internal utility functions uniqlist and uniqlengths. Here's a preliminary illustration:

rle_index <- function(vec) {
    ulist = data.table:::uniqlist(list(vec)) ## no copy in R 3.1.0+
    ulen = data.table:::uniqlengths(ulist, length(vec))
    rep(seq_along(ulist), ulen)
}
rle_index(DT$x)
#  [1] 1 2 3 4 5 5 6 7 8 8

So, the usage would be typically:

DT[, z := rle_index(x)]
## or to use with grouping
DT[, sum(y), by=list(rle_index(x))]

Here's a SO post and another.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions