-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
I've seen this scenario come across quite a few times on SO:
require(data.table)
set.seed(2L)
DT <- data.table(x=sample(3,10,TRUE), y=1:10)
# x y
# 1: 1 1
# 2: 3 2
# 3: 2 3
# 4: 1 4
# 5: 3 5
# 6: 3 6
# 7: 1 7
# 8: 3 8
# 9: 2 9
#10: 2 10
Now add a column z
, based on column x
, that starts from 1
and retains the same value (or group) as long as the successive values are the same. That is, in this case, z
is:
z <- as.integer(c(1, 2, 3, 4, 5, 5, 6, 7, 8, 8))
# [1] 1 2 3 4 5 5 6 7 8 8
This can be accomplished quite easily with data.table
's internal utility functions uniqlist
and uniqlengths
. Here's a preliminary illustration:
rle_index <- function(vec) {
ulist = data.table:::uniqlist(list(vec)) ## no copy in R 3.1.0+
ulen = data.table:::uniqlengths(ulist, length(vec))
rep(seq_along(ulist), ulen)
}
rle_index(DT$x)
# [1] 1 2 3 4 5 5 6 7 8 8
So, the usage would be typically:
DT[, z := rle_index(x)]
## or to use with grouping
DT[, sum(y), by=list(rle_index(x))]
QIanGua