Skip to content

implicit use of future with "by=" #3962

@r2evans

Description

@r2evans

Is it possible to support implicit use of future plans, perhaps utilizing future.apply::future_lapply?

For an example:

data(iris)
setDT(iris)
iris[, lapply(.SD, mean), by = Species]
#       Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1:     setosa        5.006       3.428        1.462       0.246
# 2: versicolor        5.936       2.770        4.260       1.326
# 3:  virginica        6.588       2.974        5.552       2.026

If I want to parallelize this with future, then I could:

library(future)
library(future.apply)
plan(multiprocess)
rbindlist(future_lapply(split(iris, iris$Species), function(x) x[, lapply(.SD,mean), by=Species]))
#       Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1:     setosa        5.006       3.428        1.462       0.246
# 2: versicolor        5.936       2.770        4.260       1.326
# 3:  virginica        6.588       2.974        5.552       2.026

But if there were a way to enable internal use of future.apply::future_lapply in said action (or some other future-friendly operation), it could be much easier to code and read.

Thoughts for implementation:

  1. options(datatable.futureDT=TRUE) (default false)
  2. iris[, lapply(.SD, mean), by = Species, futureDT = TRUE]
  3. with_futureDT(iris[, lapply(.SD, mean), by = Species])

This is premised on the applicability of the future package. I know there are ramifications to use of parallel anything with data.table (e.g., reference semantics, etc), I believe this is not a trivial suggestion. (For instance, data.table use within future has not been without some hurdles, https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html under "Missing packages (false negatives)".)

Session Info
sessionInfo()
# R version 3.5.3 (2019-03-11)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18362)
# Matrix products: default
# locale:
# [1] LC_COLLATE=English_United States.1252 
# [2] LC_CTYPE=English_United States.1252   
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.1252    
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# other attached packages:
# [1] future.apply_1.2.0 future_1.12.0      data.table_1.12.2
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.1       codetools_0.2-16 listenv_0.7.0    digest_0.6.18   
#  [5] crayon_1.3.4     dplyr_0.8.1      assertthat_0.2.1 R6_2.4.0        
#  [9] magrittr_1.5     evaluate_0.14    pillar_1.3.1     rlang_0.4.0     
# [13] rmarkdown_1.13   tools_3.5.3      glue_1.3.1       purrr_0.2.5     
# [17] parallel_3.5.3   compiler_3.5.3   xfun_0.8         pkgconfig_2.0.2 
# [21] globals_0.12.4   htmltools_0.3.6  knitr_1.23       tidyselect_0.2.5
# [25] tibble_2.1.3    

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions