-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is it possible to support implicit use of future
plans, perhaps utilizing future.apply::future_lapply
?
For an example:
data(iris)
setDT(iris)
iris[, lapply(.SD, mean), by = Species]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.006 3.428 1.462 0.246
# 2: versicolor 5.936 2.770 4.260 1.326
# 3: virginica 6.588 2.974 5.552 2.026
If I want to parallelize this with future
, then I could:
library(future)
library(future.apply)
plan(multiprocess)
rbindlist(future_lapply(split(iris, iris$Species), function(x) x[, lapply(.SD,mean), by=Species]))
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.006 3.428 1.462 0.246
# 2: versicolor 5.936 2.770 4.260 1.326
# 3: virginica 6.588 2.974 5.552 2.026
But if there were a way to enable internal use of future.apply::future_lapply
in said action (or some other future
-friendly operation), it could be much easier to code and read.
Thoughts for implementation:
options(datatable.futureDT=TRUE)
(default false)iris[, lapply(.SD, mean), by = Species, futureDT = TRUE]
with_futureDT(iris[, lapply(.SD, mean), by = Species])
This is premised on the applicability of the future
package. I know there are ramifications to use of parallel anything with data.table
(e.g., reference semantics, etc), I believe this is not a trivial suggestion. (For instance, data.table
use within future
has not been without some hurdles, https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html under "Missing packages (false negatives)".)
Session Info
sessionInfo()
# R version 3.5.3 (2019-03-11)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18362)
# Matrix products: default
# locale:
# [1] LC_COLLATE=English_United States.1252
# [2] LC_CTYPE=English_United States.1252
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
# other attached packages:
# [1] future.apply_1.2.0 future_1.12.0 data.table_1.12.2
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.1 codetools_0.2-16 listenv_0.7.0 digest_0.6.18
# [5] crayon_1.3.4 dplyr_0.8.1 assertthat_0.2.1 R6_2.4.0
# [9] magrittr_1.5 evaluate_0.14 pillar_1.3.1 rlang_0.4.0
# [13] rmarkdown_1.13 tools_3.5.3 glue_1.3.1 purrr_0.2.5
# [17] parallel_3.5.3 compiler_3.5.3 xfun_0.8 pkgconfig_2.0.2
# [21] globals_0.12.4 htmltools_0.3.6 knitr_1.23 tidyselect_0.2.5
# [25] tibble_2.1.3