Open
Description
Have a look at this example
library(data.table) # 1.12.6
foo <- data.table(
x = as.character(runif(n = 10^7)),
y = as.character(runif(n = 10^7)),
z = as.character(runif(n = 10^7))
)
system.time(foo[like(x, "123") & like(y, "123") & like(z, "123")])
user system elapsed
11.141 0.057 11.231
system.time(foo[like(x, "123")][like(y, "123")][like(z, "123")])
user system elapsed
3.773 0.021 3.799
As shown, I get a big speedup when I use successive logical conditions rather than and'ing them together to make a single index. This implies that data.table is checking each condition for each row rather than checking conditions lazily. Is this by design? This seems like an opportunity for a nice speedup.