Skip to content

Speed up logical indexing with multiple conditions #4105

Open
@ben519

Description

@ben519

Have a look at this example

library(data.table)  # 1.12.6

foo <- data.table(
  x = as.character(runif(n = 10^7)),
  y = as.character(runif(n = 10^7)),
  z = as.character(runif(n = 10^7))
)

system.time(foo[like(x, "123") & like(y, "123") & like(z, "123")])
   user  system elapsed 
 11.141   0.057  11.231 

system.time(foo[like(x, "123")][like(y, "123")][like(z, "123")])
   user  system elapsed 
  3.773   0.021   3.799

As shown, I get a big speedup when I use successive logical conditions rather than and'ing them together to make a single index. This implies that data.table is checking each condition for each row rather than checking conditions lazily. Is this by design? This seems like an opportunity for a nice speedup.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions