Skip to content

Eliminate vctrs::vec_cast_list() #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Depends:
R (>= 3.4.0)
Imports:
callr (>= 3.5.1),
cli,
crayon,
dplyr (>= 1.0.0),
magrittr,
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# multidplyr (development version)

* `cluster_call()` gains a `simplify` argument - use this to request that
the result should be simplified (#136).

# multidplyr 0.1.1

* Fixed problems identified as part of working on dplyr 1.0.8.
Expand Down
78 changes: 71 additions & 7 deletions R/cluster-call.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,15 @@
#'
#' @param cluster A cluster.
#' @param code An expression to execute on each worker.
#' @param ptype Determines the output type. The default returns a list,
#' which will always succeed. Set to a narrower type to simplify the output.
#' @param simplify Should the results be simplified from a list?
#' * `TRUE`: simplify or die trying.
#' * `NA`: simplify if possible.
#' * `FALSE`: never try to simplify, always leaving as a list.
#'
#' `code` must return a vector of length one in order for simplification
#' to succeed.
#' @param ptype If `simplify` is `TRUE`, use `ptype` to enforce the desired
#' output type.
#' @export
#' @return A list of results with one element for each worker in `cluster`.
#' @examples
Expand All @@ -18,16 +25,25 @@
#' cluster_call(cl, runif(1))
#'
#' # use ptype to simplify
#' cluster_call(cl, runif(1), ptype = double())
#' cluster_call(cl, runif(1), simplify = TRUE)
#'
#' # use cluster_send() to ignore results
#' cluster_send(cl, x <- runif(1))
#' cluster_call(cl, x, ptype = double())
cluster_call <- function(cluster, code, ptype = list()) {
#' cluster_call(cl, x, simplify = TRUE)
cluster_call <- function(cluster, code, simplify = FALSE, ptype = NULL) {
stopifnot(is_cluster(cluster))
code <- enexpr(code)
to_rm <- attr(cluster, "cleaner")$reset()

if (length(simplify) > 1 || !is.logical(simplify)) {
cli::cli_abort("{.arg simplify} must be `TRUE`, `FALSE`, or `NA`.")
}
if (!isTRUE(simplify) && !is.null(ptype)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe take simplify = NULL and use a different default depending on whether ptype is supplied? If not supplied, defaults to FALSE. If supplied, defaults to TRUE.

Do we want to generally support simplify = NA for consistency with purrr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to move this interface towards list_simplify() so I'd prefer some warning here.

But you're right that I should I support simplify = NA in order to match the interface exactly.

# 0.1.2
warn("Must now set `simplify = TRUE` when supplying ptype")
simplify <- TRUE
}

# nocov start
f <- function(code, to_rm) {
rm(list = to_rm, envir = globalenv())
Expand All @@ -49,8 +65,8 @@ cluster_call <- function(cluster, code, ptype = list()) {
}

out <- lapply(results, "[[", "result")
if (!identical(ptype, list())) {
out <- vctrs::vec_list_cast(out, ptype)
if (!isFALSE(simplify)) {
out <- simplify_impl(out, strict = !is.na(simplify), ptype = ptype)
}
out
}
Expand All @@ -64,3 +80,51 @@ cluster_send <- function(cluster, code) {

invisible(cluster)
}

# TODO: replace with purrr::list_simplify() when purrr 1.0.0 is out
simplify_impl <- function(x,
strict = TRUE,
ptype = NULL,
error_arg = caller_arg(x),
error_call = caller_env()) {
vctrs::vec_check_list(x, arg = error_arg, call = error_call)

# Handle the cases where we definitely can't simplify
if (strict) {
vctrs::list_check_all_vectors(x, arg = error_arg, call = error_call)
size_one <- vctrs::list_sizes(x) == 1L
can_simplify <- all(size_one)

if (!can_simplify) {
bad <- which(!size_one)[[1]]
cli::cli_abort(
c(
"All elements must be size 1.",
i = "`{error_arg}[[{bad}]]` is size {vec_size(x[[bad]])}."
),
call = error_call
)
}
} else {
can_simplify <- vctrs::list_all_vectors(x) && all(vctrs::list_sizes(x) == 1L)

if (!can_simplify) {
return(x)
}
}

names <- vctrs::vec_names(x)
x <- vctrs::vec_set_names(x, NULL)

out <- tryCatch(
vctrs::vec_c(!!!x, ptype = ptype),
vctrs_error_incompatible_type = function(err) {
if (strict || !is.null(ptype)) {
cnd_signal(err)
} else {
x
}
}
)
vctrs::vec_set_names(out, names)
}
18 changes: 13 additions & 5 deletions man/cluster_call.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions tests/testthat/_snaps/cluster-call.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
# validates inputs

Code
cluster_call(cl, 1, simplify = "x")
Condition
Error in `cluster_call()`:
! `simplify` must be `TRUE`, `FALSE`, or `NA`.

# old ptype interface works with warning

Code
out <- cluster_call(cl, 1, ptype = double())
Condition
Warning:
Must now set `simplify = TRUE` when supplying ptype

# errors are propagated

Code
Expand Down
19 changes: 17 additions & 2 deletions tests/testthat/test-cluster-call.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,24 @@ test_that("calls submitted to each node", {
expect_equal(length(unique(pid)), length(cl))
})

test_that("can collapse results", {
test_that("can simplify results", {
cl <- default_cluster()
out <- cluster_call(cl, 1, ptype = double())
out <- cluster_call(cl, 1, simplify = TRUE)
expect_identical(out, c(1, 1))
})

test_that("validates inputs", {
cl <- default_cluster()
expect_snapshot(error = TRUE, {
cluster_call(cl, 1, simplify = "x")
})
})

test_that("old ptype interface works with warning", {
cl <- default_cluster()
expect_snapshot({
out <- cluster_call(cl, 1, ptype = double())
})
expect_identical(out, c(1, 1))
})

Expand Down