Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pick() and use across() output as a data frame #341

Closed
markfairbanks opened this issue Feb 11, 2022 · 6 comments · Fixed by #397
Closed

Implement pick() and use across() output as a data frame #341

markfairbanks opened this issue Feb 11, 2022 · 6 comments · Fixed by #397
Labels
feature a feature request or enhancement

Comments

@markfairbanks
Copy link
Collaborator

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = c("a", "a", "b"), y = 1:3, z = 1:3)

df %>%
  mutate(row_sum = rowSums(across(c(y, z))))
#> # A tibble: 3 × 4
#>   x         y     z row_sum
#>   <chr> <int> <int>   <dbl>
#> 1 a         1     1       2
#> 2 a         2     2       4
#> 3 b         3     3       6
@guersam
Copy link

guersam commented Apr 6, 2022

Using lazy_dt produces this error:

Error in rowSums(list(y = y, z = z)) : 
  'x' must be an array of at least two dimensions

@markfairbanks
Copy link
Collaborator Author

markfairbanks commented Jun 21, 2022

Not sure the best way to handle this one. At first I thought the way to solve this would be by checking if an input dot was an across call. If not, capture_dots() would convert the across() to a call to a data.table() call.

Simple example:

dots_fn <- function(...) {
  dots <- enquos(...)
  is_top_across <- map(dots, quo_is_call, "across")
  is_top_across
}

This seems reasonable and works well in the following situations:

df %>%
  mutate(
    # "top" across call treated normally
    across(c(x, y), ~ .x + 1),
    # nested within another call - treated as a `data.table()` call
    row_sum = rowSums(across(c(y, z)))
  )

However I'm finding this approach can fail when dealing with operations applied to list-columns:

library(tidyverse)

list_df <- tibble(a = 1:3, b = 1:3)

df <- tibble(x = c("a", "b"), list_col = list(list_df, list_df))

df %>%
  mutate(
    list_col = map(list_col, ~ .x %>% mutate(across(c(a, b), ~ .x + 1)))
  )

Here we would want the map(mutate()) to treat the across() as a "top" across(). But the outermost mutate() would have already converted it to a call to data.table().

I haven't figured out a good workaround yet.

@markfairbanks markfairbanks added the feature a feature request or enhancement label Jun 21, 2022
@markfairbanks
Copy link
Collaborator Author

markfairbanks commented Jun 22, 2022

Actually all we would need to do is catch if arrange/filter/mutate/slice/summarise/summarize are called inside of the dot and then return that part of the call unaltered.

@markfairbanks
Copy link
Collaborator Author

markfairbanks commented Nov 3, 2022

Updates from dplyr

Implement pick(): tidyverse/dplyr#6492
Deprecate across(.fns = NULL): tidyverse/dplyr#6523

So we need to implement pick(). Should we just skip implementing the across() version since it's being deprecated? Or do we allow it for now since this will probably work in dplyr for a while?

@markfairbanks markfairbanks changed the title Use across() output as a data frame Implement pick() and use across() output as a data frame Nov 3, 2022
@eutwt
Copy link
Collaborator

eutwt commented Nov 4, 2022

I think implementing pick() is a separate issue from this one, because across output can still be used as a data frame in dplyr even with a non-NULL .fns

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = c("a", "a", "b"), y = 1:3, z = 1:3)

df %>%
  mutate(row_sum = rowSums(across(c(y, z), sin)))
#> # A tibble: 3 × 4
#>   x         y     z row_sum
#>   <chr> <int> <int>   <dbl>
#> 1 a         1     1   1.68 
#> 2 a         2     2   1.82 
#> 3 b         3     3   0.282

Created on 2022-11-03 with reprex v2.0.2

But, I don't see a good way to get this behavior in dtplyr. I'd say this issue should be closed unless you feel differently (and open a new one for pick())

@eutwt
Copy link
Collaborator

eutwt commented Nov 4, 2022

Oh, didn't see your earlier comments. Seems you have an idea for how to implement, so disregard my comment :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants