Skip to content

Slide improvements pass #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 115 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
775c0fc
Clean up unused assignment
brookslogan Jun 12, 2024
3d4e498
fix(epix_slide): don't serialize object in error message
brookslogan Jun 19, 2024
e0188ce
Mention trailing commas as possible cause of slide ... error
brookslogan Jun 22, 2024
607e8e7
WIP (to rebase): match data-masking outputs, deprecate competing parms
brookslogan Jun 26, 2024
0f876c3
docs: document (GHA)
brookslogan Jun 26, 2024
7a62004
style: styler (GHA)
brookslogan Jun 26, 2024
b2e4e61
as_slide_computation doesn't take new_col_name in this iteration
brookslogan Jun 26, 2024
0a433dc
Fix some of the failing epi_slide tests
brookslogan Jun 26, 2024
3ee20e8
Update remaining epi_slide tests
brookslogan Jun 27, 2024
d69c5af
style: styler (GHA)
brookslogan Jun 27, 2024
02211a4
Shift to new computation output format scheme in epix_slide()
brookslogan Jul 1, 2024
127c1f1
docs: document (GHA)
brookslogan Jul 1, 2024
c17b67e
Fix and update comp_value validation
brookslogan Jul 30, 2024
0fa282a
style: styler (GHA)
brookslogan Jul 30, 2024
b128abb
Fix `as_tibble` on grouped `epi_df`s w/ current `tsibble` version
brookslogan Jul 30, 2024
8d7ea7d
Fix tidyeval unnamed tibble names access errors
brookslogan Jul 30, 2024
c66ef54
Update archive slide tests using deprecated features
brookslogan Jul 30, 2024
30a2166
Actually allow multiple quosures to be passed into `epi_slide`
brookslogan Jul 30, 2024
e9b1b1b
Update `epi_slide` tests with expanded data-masking support
brookslogan Jul 30, 2024
9b73649
Avoid warning in slides on `binding = NULL` if binding doesn't exist
brookslogan Jul 30, 2024
a1c7020
Make advanced slide tidyeval tests realistically work around recycling
brookslogan Jul 30, 2024
ec77184
Test that epi_slide balks at bad computation outputs
brookslogan Jul 30, 2024
7b5bd4e
Merge remote-tracking branch 'upstream/dev' into lcb/slide-improvemen…
brookslogan Jul 31, 2024
e089d42
Document new slide tidyeval features
brookslogan Jul 31, 2024
4027f06
docs: more on tidyeval features, as_list_col + names_sep deprecations
brookslogan Jul 31, 2024
2160921
Check deprecations earlier in fns, correct the `when` args
brookslogan Aug 1, 2024
90b7e2f
Fix tidyeval breaking on dot-prefixed column names
brookslogan Aug 1, 2024
1f410fc
fix: Actually complete workaround for tsibble:::as_tibble.grouped_df
brookslogan Aug 2, 2024
512c6ba
refactor(slide.R): move slide_one_grp definition after input munging
brookslogan Aug 2, 2024
ba536cd
Fix deprecations referring to wrong functions
brookslogan Aug 2, 2024
00c0d85
refactor(epix_slide): don't double-convert tidyeval + better code ord…
brookslogan Aug 2, 2024
70e86a3
Soften as_list_col and names_sep deprecations
brookslogan Aug 2, 2024
5452bba
Fix some deprecation message naming and details
brookslogan Aug 2, 2024
dfb249a
Fix softer deprecation of epi[x]_slide(<nm> = <df>, as_list_col = TRUE)
brookslogan Aug 2, 2024
7a7e781
fix(epi[x]_slide): again, don't reject (row)named data frames
brookslogan Aug 2, 2024
418a65a
docs(epi[x]_slide): update NEWS.md, vignettes for breaking changes
brookslogan Aug 3, 2024
0d794c6
Linting, nolinting, and nonolinting
brookslogan Aug 5, 2024
f53f331
refactor: explain how tidyeval column redefinition actually works out
brookslogan Aug 15, 2024
5cfa121
refactor(epi[x]_slide): rearrange new_col_name logic branches
brookslogan Aug 21, 2024
7277b81
tests: refactor a few slide tests (#516)
dshemetov Aug 21, 2024
4496512
Merge branch 'dev' into lcb/slide-improvements-2024-06
dshemetov Aug 21, 2024
4fe94da
docs: document (GHA)
dshemetov Aug 21, 2024
6b7944e
tests: enable parallel tests
dshemetov Aug 21, 2024
6a70274
tests: fix a few tests
dshemetov Aug 21, 2024
902bf61
docs: document (GHA)
dshemetov Aug 21, 2024
0faa438
Fix misfiled NEWS.md entry, combine two breaking changes sections
brookslogan Aug 21, 2024
d88ec2d
tests: even more slide test consolidation
dshemetov Aug 22, 2024
4553b89
doc: document
dshemetov Aug 22, 2024
971da03
tests: even more slide test refactors
dshemetov Aug 22, 2024
871946f
lint: lint
dshemetov Aug 22, 2024
de948b5
wip: rework slide window args (#513)
dshemetov Aug 23, 2024
b393491
lint: lint
dshemetov Aug 23, 2024
9c4c49d
doc: doc
dshemetov Aug 23, 2024
57cda93
feat: other_keys as arg in epi_df, epi_archive (#512)
dajmcdon Aug 23, 2024
f479acf
docs: version and news
dshemetov Aug 23, 2024
2382af0
refactor!: hard deprecate names_sep and as_list_col in epix_slide
dshemetov Aug 23, 2024
5f9ffc8
doc: doc
dshemetov Aug 23, 2024
2c043d5
refactor: dot prefix epix_slide args as well
dshemetov Aug 23, 2024
c0f6499
doc: improve the .f documentation for epi_slide and epix_slide
dshemetov Aug 24, 2024
2f91a90
doc: switch centre to center
dshemetov Aug 24, 2024
9f33e1b
tests(assert_sufficient_f_args): test vs. mean, sum, slice; use expec…
brookslogan Aug 1, 2024
14cd736
BREAKING CHANGE(epix_slide): output `version` column, other re/dual-n…
brookslogan Aug 2, 2024
3110a7f
Fix vignette re. old clobberable version default + use versions_end
brookslogan Aug 6, 2024
aa73944
Rename max_version -> version in epix_as_of
brookslogan Aug 6, 2024
547d156
fix(epix_slide): partial time_value -> version output col rename
brookslogan Aug 20, 2024
4290363
Add group_vars.grouped_epi_archive
brookslogan Aug 21, 2024
4b112e5
refactor+tweak: add+use `format_class_vec` helper for messages
brookslogan Aug 22, 2024
8838c71
WIP Add docs for de-dupe approach, part of the required validation
brookslogan Aug 22, 2024
9127952
Style and fix&improve some .new_col_name validation
brookslogan Aug 22, 2024
7af57cc
style: styler (GHA)
brookslogan Aug 26, 2024
1181b97
Fix slide rebase issues, other partial renames, dotprefix internalfn
brookslogan Aug 29, 2024
1ae0ef5
Dot-prefix more args, locals; finish some incomplete renames
brookslogan Aug 29, 2024
6854c07
docs: document (GHA)
brookslogan Sep 4, 2024
ef2639e
Dot-prefix `f` args in helper functions (& their messages)
brookslogan Sep 4, 2024
9aafabb
WIP output column de-duping in epix_slide
brookslogan Sep 4, 2024
e463714
style: styler (GHA)
brookslogan Sep 4, 2024
8973edd
Add missing dot prefixes in some messages
brookslogan Sep 5, 2024
e039a40
Fix empty-version grouped slide behavior change, tweak warning
brookslogan Sep 5, 2024
dd5c769
feat: refactor epi_slide
dshemetov Aug 24, 2024
4444a6c
Fix typo
brookslogan Sep 10, 2024
b2dfa09
Allow dupe & dedupe cols in epi_slide; needs&helps future .keep=TRUE
brookslogan Sep 12, 2024
9b4f10a
style: styler (GHA)
brookslogan Sep 12, 2024
a110a42
Move to .keep = TRUE in `epi_slide` + fix other dedupe issues
brookslogan Sep 12, 2024
dfd49f5
fix: aggregate is now sum_groups_epi_df and other review changes
dshemetov Sep 6, 2024
53005ff
Correct commentary regarding slide output type edge cases
brookslogan Sep 13, 2024
e1d300d
feat(epi_slide): catch common operations x forgotten colname
brookslogan Sep 13, 2024
129ad9f
Test epi_df forbidden methods + fix error message
brookslogan Sep 13, 2024
fd04433
style: styler (GHA)
brookslogan Sep 13, 2024
16cf8d7
Use `cnd_class = TRUE` when snapshot is primary test
brookslogan Sep 13, 2024
8eca321
improve is null or na
dshemetov Sep 13, 2024
9ab9731
Merge pull request #521 from cmu-delphi/lcb/slide-unnest-dedupe-cols
brookslogan Sep 16, 2024
b1ab47d
refactor: move group_epi_df
dshemetov Sep 16, 2024
6b93d79
refactor: add arrange_[col/row]_canonical
dshemetov Sep 16, 2024
926f845
refactor: improve and slim down epi_slide tests
dshemetov Sep 17, 2024
c167ddf
lint: line breaks
dshemetov Sep 17, 2024
cccd6e3
Merge branch 'lcb/slide-improvements-2024-06' into ds/epi-slide-group
dshemetov Sep 17, 2024
f1abe17
fix: leftover merge issue and test fix
dshemetov Sep 17, 2024
409dcac
docs: document (GHA)
dshemetov Sep 17, 2024
214100d
Merge pull request #519 from cmu-delphi/ds/epi-slide-group
dshemetov Sep 18, 2024
d2372f0
Better detection, messaging, docs on epi_slide output clashes
brookslogan Sep 18, 2024
166b8c3
Fix partial rename
brookslogan Sep 18, 2024
82047a4
lint: lint
dshemetov Sep 18, 2024
46af707
Make format_tibble_row look better on time classes
brookslogan Sep 19, 2024
177e7cc
Improve slide output column conflict messages
brookslogan Sep 19, 2024
bde98ec
Fix data$ typo, avoid `.data$` in tidyselections
brookslogan Sep 19, 2024
48735c5
lint: more lint
dshemetov Sep 19, 2024
227e133
fix: .ref_time_values -> .versions in epix_slide and some doc fixes
dshemetov Sep 19, 2024
0eba732
docs: document (GHA)
dshemetov Sep 19, 2024
61de9ec
fix: fix the fix
dshemetov Sep 19, 2024
25aa1b2
Also check for `slide_value` name conflict in `epix_slide`
brookslogan Sep 20, 2024
4f8e8d5
Remove todo comment suggesting bad path
brookslogan Sep 20, 2024
d2dc2bf
Add some epix_slide output col dedupe tests
brookslogan Sep 20, 2024
dd19428
style: styler (GHA)
brookslogan Sep 20, 2024
8202c0e
refactor+doc: key_colnames and vignettes
dshemetov Sep 20, 2024
187bd5d
doc: update NEWS
dshemetov Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ Collate:
'correlation.R'
'data.R'
'epi_df.R'
'epi_df_forbidden_methods.R'
'epiprocess.R'
'group_by_epi_df_methods.R'
'methods-epi_archive.R'
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

S3method("[",epi_df)
S3method("names<-",epi_df)
S3method(Summary,epi_df)
S3method(arrange_canonical,default)
S3method(arrange_canonical,epi_df)
S3method(as_epi_df,data.frame)
Expand Down Expand Up @@ -36,6 +37,7 @@ S3method(key_colnames,data.frame)
S3method(key_colnames,default)
S3method(key_colnames,epi_archive)
S3method(key_colnames,epi_df)
S3method(mean,epi_df)
S3method(next_after,Date)
S3method(next_after,integer)
S3method(print,epi_archive)
Expand Down Expand Up @@ -148,6 +150,7 @@ importFrom(dplyr,everything)
importFrom(dplyr,filter)
importFrom(dplyr,group_by)
importFrom(dplyr,group_by_drop_default)
importFrom(dplyr,group_map)
importFrom(dplyr,group_modify)
importFrom(dplyr,group_vars)
importFrom(dplyr,groups)
Expand Down
8 changes: 4 additions & 4 deletions R/archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -693,20 +693,20 @@ group_by.epi_archive <- function(.data, ..., .add = FALSE,
grouping_cols <- as.list(detailed_mutate[["archive"]][["DT"]])[detailed_mutate[["request_names"]]]
grouping_col_is_factor <- purrr::map_lgl(grouping_cols, is.factor)
# ^ Use `as.list` to try to avoid any possibility of a deep copy.
if (!any(grouping_col_is_factor)) {
if (length(grouping_cols) != 0L && !any(grouping_col_is_factor)) {
cli_warn(
"`.drop=FALSE` but there are no factor grouping columns;
"`.drop=FALSE` but none of the grouping columns are factors;
did you mean to convert one of the columns to a factor beforehand?",
class = "epiprocess__group_by_epi_archive__drop_FALSE_no_factors"
)
} else if (any(diff(grouping_col_is_factor) == -1L)) {
cli_warn(
"`.drop=FALSE` but there are one or more non-factor grouping columns listed
after a factor grouping column; this may produce groups with `NA`s for these
columns; see https://github.com/tidyverse/dplyr/issues/5369#issuecomment-683762553;
non-factor columns; see https://github.com/tidyverse/dplyr/issues/5369#issuecomment-683762553;
depending on how you want completion to work, you might instead want to convert all
grouping columns to factors beforehand, specify the non-factor grouping columns first,
or use `.drop=TRUE` and add a call to `tidyr::complete`.",
or use `.drop=TRUE` and add a call to `tidyr::complete()`.",
class = "epiprocess__group_by_epi_archive__drop_FALSE_nonfactor_after_factor"
)
}
Expand Down
48 changes: 48 additions & 0 deletions R/epi_df_forbidden_methods.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Methods in this file are used to
# * Disable problematic inherited behavior (e.g., mean on epi_dfs)
# * Provide better error messaging if possible for things that already abort
# when they should (e.g., sum on epi_dfs)


# Disable mean on epi_dfs, to prevent `epi_slide(~ mean(.x), ....)` bad output:

#' @export
mean.epi_df <- function(x, ...) {
cli_abort(c(
"`mean` shouldn't be used on entire `epi_df`s",
"x" = "{rlang::caller_arg(x)} was an `epi_df`",
"i" = "If you encountered this while trying to take a rolling mean
of a column using `epi_slide`, you probably forgot to
specify the column name (e.g., ~ mean(.x$colname)). You may
also prefer to use the specialized `epi_slide_mean` method."
), class = "epiprocess__summarizer_on_entire_epi_df")
}

# Similarly, provide better error messages for some other potentially-common
# slide operations (sum, prod, min, max, all, any, range):

#' @export
Summary.epi_df <- function(..., na.rm = FALSE) {

Check warning on line 25 in R/epi_df_forbidden_methods.R

View workflow job for this annotation

GitHub Actions / lint

file=R/epi_df_forbidden_methods.R,line=25,col=33,[object_name_linter] Variable and function name style should match snake_case or symbols.
# cli uses dot prefixes for special purpose; use alias to avoid confusion during interpolation
generic <- .Generic

Check warning on line 27 in R/epi_df_forbidden_methods.R

View workflow job for this annotation

GitHub Actions / lint

file=R/epi_df_forbidden_methods.R,line=27,col=3,[object_usage_linter] local variable 'generic' assigned but may not be used

Check warning on line 27 in R/epi_df_forbidden_methods.R

View workflow job for this annotation

GitHub Actions / lint

file=R/epi_df_forbidden_methods.R,line=27,col=14,[object_usage_linter] no visible binding for global variable '.Generic'
opt_pointer <- switch(.Generic,

Check warning on line 28 in R/epi_df_forbidden_methods.R

View workflow job for this annotation

GitHub Actions / lint

file=R/epi_df_forbidden_methods.R,line=28,col=3,[object_usage_linter] local variable 'opt_pointer' assigned but may not be used

Check warning on line 28 in R/epi_df_forbidden_methods.R

View workflow job for this annotation

GitHub Actions / lint

file=R/epi_df_forbidden_methods.R,line=28,col=25,[object_usage_linter] no visible binding for global variable '.Generic'
sum = "You may also prefer to use the specialized `epi_slide_sum` method.",
prod = ,
min = ,
max = ,
all = ,
any = "You may also prefer to use the specialized `epi_slide_opt` method.",
range = "",
cli_abort("Unrecognized .Generic: {generic}")
)
cli_abort(c(
"`{generic}` shouldn't be used on entire `epi_df`s",
# We'd like to quote user input in the error message, but `caller_arg(..1)` is
# just "..1" and (eagerness/S4/unnamedness?) issues thwart some alternatives; just
# use something generic:
"x" = "`{generic}`'s first argument was an `epi_df`",
"i" = "If you encountered this while trying to take a rolling {generic}
of a column using `epi_slide`, you probably forgot to
specify the column name (e.g., ~ {generic}(.x$colname)). {opt_pointer}"
), class = "epiprocess__summarizer_on_entire_epi_df")
}
94 changes: 70 additions & 24 deletions R/grouped_epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -341,30 +341,75 @@
", class = "epiprocess__invalid_slide_comp_value")
}

.group_key_label <- if (nrow(.group_key) == 0L) {
# Edge case: we'll get here if a requested `.version` had 0 rows and we
# grouped by a nonzero number of columns using the default `.drop = TRUE`
# (or on all non-factor columns with `.drop = FALSE` for some reason,
# probably a user bug). Mimicking `dplyr`, we'll let `.group_key` provided
# to the computation be 0 rows, but then label it using NAs. (In the
# bizarre situation of grouping by a mix of factor and non-factor with
# `.drop = FALSE`, `.group_key` will already have 1 row. For ungrouped
# epix_slides and 0-variable-grouped epix_slides with either `.drop`
# setting, we will have a 1x0 .group_key, although perhaps for the latter
# this should be 0x0.)
vctrs::vec_cast(NA, .group_key)
} else {
.group_key
}

# Construct result first as list, then tibble-ify, to try to avoid some
# redundant work. `group_modify()` provides the group key, we provide the
# ref time value (appropriately recycled) and comp_value (appropriately
# named / unpacked, for quick feedback)
res <- list(version = vctrs::vec_rep(.version, vctrs::vec_size(comp_value)))
# redundant work. However, we will sacrifice some performance here doing
# checks here in the inner loop, in order to provide immediate feedback on
# some formatting errors.
res <- c(
list(), # get list output; a bit faster than `as.list()`-ing `.group_key_label`
.group_key_label,
list(version = .version)
)
res <- vctrs::vec_recycle_common(!!!res, .size = vctrs::vec_size(comp_value))

if (is.null(.new_col_name)) {
if (inherits(comp_value, "data.frame")) {
# unpack into separate columns (without name prefix):
res <- c(res, comp_value)
# Sometimes comp_value can parrot back columns already in `res`; allow
# this, but balk if a column has the same name as one in `res` but a
# different value:
comp_nms <- names(comp_value)
overlaps_label_names <- comp_nms %in% names(res)
for (comp_i in which(overlaps_label_names)) {
if (!identical(comp_value[[comp_i]], res[[comp_nms[[comp_i]]]])) {
lines <- c(
cli::format_error(c(
"conflict detected between slide value computation labels and output:",
"i" = "we are labeling slide computations with the following columns: {syms(names(res))}",
"x" = "a slide computation output included a column {syms(comp_nms[[comp_i]])} that didn't match the label"

Check warning on line 384 in R/grouped_epi_archive.R

View workflow job for this annotation

GitHub Actions / lint

file=R/grouped_epi_archive.R,line=384,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 123 characters.
)),
capture.output(print(waldo::compare(res[[comp_nms[[comp_i]]]], comp_value[[comp_i]], x_arg = "label", y_arg = "comp output"))),

Check warning on line 386 in R/grouped_epi_archive.R

View workflow job for this annotation

GitHub Actions / lint

file=R/grouped_epi_archive.R,line=386,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 141 characters.
cli::format_message(c("You likely want to rename or remove this column in your output, or debug why it has a different value."))

Check warning on line 387 in R/grouped_epi_archive.R

View workflow job for this annotation

GitHub Actions / lint

file=R/grouped_epi_archive.R,line=387,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 142 characters.
)
rlang::abort(paste(collapse = "\n", lines),
class = "epiprocess__epix_slide_label_vs_output_column_conflict"
)
}
}
# Unpack into separate columns (without name prefix). If there are
# columns duplicating label columns, de-dupe and order them as if they
# didn't exist in comp_value.
res <- c(res, comp_value[!overlaps_label_names])
} else {
# apply default name (to vector or packed data.frame-type column):
# Apply default name (to vector or packed data.frame-type column):
res[["slide_value"]] <- comp_value
# TODO check for bizarre conflicting `slide_value` label col name.
# Either here or on entry to `epix_slide` (even if there we don't know
# whether vecs will be output). Or just turn this into a special case of
# the preceding branch and let the checking code there generate a
# complaint.
}
} else {
# vector or packed data.frame-type column (note: .new_col_name of
# "version" is disallowed):
# vector or packed data.frame-type column (note: overlaps with label
# column names should already be forbidden by earlier validation):
res[[.new_col_name]] <- comp_value
}

# Stop on naming conflicts (names() fine here, non-NULL). Not the
# friendliest error messages though.
vctrs::vec_as_names(names(res), repair = "check_unique")

# Fast conversion:
return(validate_tibble(new_tibble(res)))
}
Expand All @@ -380,18 +425,19 @@

# Set:
# * `as_of_df`, the data.frame/tibble/epi_df/etc. that we will
# `group_modify` as the `.data` argument. Might or might not
# `group_map` as the `.data` argument. Might or might not
# include version column.
# * `group_modify_fn`, the corresponding `.f` argument
# * `group_map_fn`, the corresponding `.f` argument for `group_map`
# (not our `.f`)
if (!.all_versions) {
as_of_df <- as_of_raw
group_modify_fn <- comp_one_grp
group_map_fn <- comp_one_grp
} else {
as_of_archive <- as_of_raw
# We essentially want to `group_modify` the archive, but
# haven't implemented this method yet. Next best would be
# `group_modify` on its `$DT`, but that has different
# behavior based on whether or not `dtplyr` is loaded.
# behavior based on whether or not `dtplyr` < 1.3.0 is loaded.
# Instead, go through an ordinary data frame, trying to avoid
# copies.
if (address(as_of_archive$DT) == address(.x$private$ungrouped$DT)) {
Expand All @@ -408,10 +454,10 @@
data.table::setDF(as_of_df)

# Convert each subgroup chunk to an archive before running the calculation.
group_modify_fn <- function(.data_group, .group_key,
.slide_comp, ...,
.version,
.new_col_name) {
group_map_fn <- function(.data_group, .group_key,
.slide_comp, ...,
.version,
.new_col_name) {
# .data_group is coming from as_of_df as a tibble, but we
# want to feed `comp_one_grp` an `epi_archive` backed by a
# DT; convert and wrap:
Expand All @@ -428,14 +474,14 @@
}

return(
dplyr::group_modify(
dplyr::bind_rows(dplyr::group_map( # note: output will be ungrouped
dplyr::group_by(as_of_df, !!!syms(.x$private$vars), .drop = .x$private$drop),
group_modify_fn,
group_map_fn,
.slide_comp = .slide_comp, ...,
.version = .version,
.new_col_name = .new_col_name,
.keep = TRUE
)
))
)
})
# Combine output into a single tibble (allowing for packed columns)
Expand Down
Loading
Loading