Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pick() #397

Merged
merged 6 commits into from
Nov 14, 2022
Merged

Implement pick() #397

merged 6 commits into from
Nov 14, 2022

Conversation

markfairbanks
Copy link
Collaborator

@markfairbanks markfairbanks commented Nov 10, 2022

Closes #341

Overview: Translate pick() and across() calls to a data.table() call.

devtools::load_all(".")
#> ℹ Loading dtplyr

df <- lazy_dt(tibble(x = 1:3, y = 1:3, z = c("a", "a", "b")))

df %>%
  mutate(row_sum = rowSums(pick(x, y)))
#> Source: local data table [3 x 4]
#> Call:   copy(`_DT1`)[, `:=`(row_sum = rowSums(data.table(x = x, y = y)))]
#> 
#>       x     y z     row_sum
#>   <int> <int> <chr>   <dbl>
#> 1     1     1 a           2
#> 2     2     2 a           4
#> 3     3     3 b           6
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

df %>%
  mutate(new = rowSums(across(c(x, y), ~ .x + 1)))
#> Source: local data table [3 x 4]
#> Call:   copy(`_DT1`)[, `:=`(new = rowSums(data.table(x = x + 1, y = y + 
#>     1)))]
#> 
#>       x     y z       new
#>   <int> <int> <chr> <dbl>
#> 1     1     1 a         4
#> 2     2     2 a         6
#> 3     3     3 b         8
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

@eutwt
Copy link
Collaborator

eutwt commented Nov 11, 2022

Top level pick calls aren't expanded in this PR's current state. For example this fails

test_that("top level across() and pick() are expanded", {
  df <- lazy_dt(tibble(x = 1:3, y = 1:3, z = c("a", "a", "b")))
  expect_identical(group_by(df, x, y), group_by(df, pick(x, y)))
  expect_identical(group_by(df, x, y), group_by(df, across(c(x, y))))
})

My first thought was to just change the top_across definition in capture_dots() so that it's TRUE for pick() calls too.

top_across <- map(dots, quo_is_call, "across")

But I think the info we need is just whether it's a top-level expression; we could avoid checking for across/pick calls at that stage and just set is_top = TRUE. More concretely, this seems to work:

  • Rename is_top_across to just is_top everywhere it appears in tidyeval.R
  • In capture_dots() and capture_new_vars(), set dt_squash(..., is_top = TRUE) for all dots, rather than checking dots for across calls
  • Change the dt_squash call below to use is_top = FALSE (arguments inside calls being squashed aren't top level)

    dtplyr/R/tidyeval.R

    Lines 250 to 253 in 5c80dc3

    } else {
    x[-1] <- lapply(x[-1], dt_squash, env, data, j = j, is_top_across)
    x
    }

I'd lean towards the second option, but I don't think there are real issues with either, just a style difference.

@markfairbanks
Copy link
Collaborator Author

Ready for review again. Simplified per the suggestion - the logic seems much better this way.

Copy link
Collaborator

@eutwt eutwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm pretty happy to have this feature

@markfairbanks markfairbanks merged commit 5750999 into main Nov 14, 2022
@markfairbanks markfairbanks deleted the implement-pick branch November 14, 2022 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pick() and use across() output as a data frame
2 participants