Skip to content

Climatological #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 60 commits into from
Mar 12, 2025
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
463734c
add lubridate
dajmcdon Dec 3, 2024
a64b6db
style on short line
dajmcdon Dec 3, 2024
3f2c1c1
add a hidden column to track the time_value of the outcome
dajmcdon Dec 3, 2024
4af6a8a
typo in documentation and error msg
dajmcdon Dec 3, 2024
301e651
adding predictor drafted
dajmcdon Dec 3, 2024
6cb7897
remove ability for `...` to be empty in adjust_latency
dajmcdon Dec 3, 2024
426f0c7
add/remove hidden target_time_value column
dajmcdon Dec 3, 2024
fef636c
finish climate predictor draft
dajmcdon Dec 3, 2024
3c2c9b7
redocument
dajmcdon Dec 3, 2024
031b605
update rproj
dajmcdon Jan 17, 2025
c598840
drafted, needs refactor
dajmcdon Jan 17, 2025
83e3f8e
Merge branch 'dev' into climatological
dajmcdon Jan 31, 2025
26461c8
revert addition of hidden .target_time_value
dajmcdon Feb 4, 2025
a66d07f
climate prep/bake steps work as planned. does not adjust for aheads
dajmcdon Feb 4, 2025
10cd85f
tests pass
dajmcdon Feb 4, 2025
cd18833
rename objects
dajmcdon Feb 5, 2025
bcbda6d
climate predictor works as desired
dajmcdon Feb 6, 2025
2fabf45
rename test
dajmcdon Feb 6, 2025
ced85b4
more detailed documentation
dajmcdon Feb 6, 2025
01182d7
add another example
dajmcdon Feb 6, 2025
83a6ec8
use all_of in tidyselect
dajmcdon Feb 7, 2025
3d3c2b4
simplify docs, quantiles
dajmcdon Feb 8, 2025
950f363
merge dev
dajmcdon Feb 10, 2025
dab8d43
merge dev epi_recipe
dajmcdon Feb 10, 2025
d6725a8
refactor the forecaster
dajmcdon Feb 10, 2025
0be6acf
missing autoplot branch
dajmcdon Feb 10, 2025
d464c19
test args list
dajmcdon Feb 11, 2025
7afbde8
add reasonable examples
dajmcdon Feb 11, 2025
151d8ea
tests pass
dajmcdon Feb 11, 2025
381a298
add snapshot tests
dajmcdon Feb 11, 2025
83e0ad7
redocument, add news
dajmcdon Feb 11, 2025
d56d488
prefix with recipes, aside to rename object
dajmcdon Feb 11, 2025
c1cf5ba
muffle dplyr select warnings
dajmcdon Feb 11, 2025
4a348c8
shifting must happen before joining to avoid duplicated rows
dajmcdon Feb 11, 2025
46b2a2e
ignore renviron
dajmcdon Feb 11, 2025
5f76be3
months is in base, so we ensure we use lubridate
dajmcdon Feb 11, 2025
77dbcc4
rebuild readme
dajmcdon Feb 11, 2025
c98b47c
satisfy styler
dajmcdon Feb 11, 2025
48ea5ca
add a print method
dajmcdon Feb 12, 2025
7089c2f
David's suggestions
dsweber2 Feb 12, 2025
60d6aeb
time_type defaulting to the epi_df's time_type
dsweber2 Feb 13, 2025
759a3b8
unneeded else branch
dsweber2 Feb 14, 2025
4477403
Merge pull request #441 from cmu-delphi/climate_suggestions
dsweber2 Feb 14, 2025
d072839
Merge branch 'dev' into climatological
dajmcdon Feb 15, 2025
5d062db
fu styler!!
dajmcdon Feb 16, 2025
21c67cd
deletion from news
dajmcdon Feb 16, 2025
05f0490
redocument, update snapshots
dajmcdon Feb 16, 2025
cfad1f1
update snaps for upstream package
dajmcdon Feb 16, 2025
bece1fe
compute the .pred_distn using residuals. document the default
dajmcdon Feb 19, 2025
5e3ed56
fix rlang bug, more detailed examples
dajmcdon Feb 19, 2025
8238703
pass checks
dajmcdon Feb 19, 2025
08fcd01
once more, FU styler
dajmcdon Feb 19, 2025
a5dea72
step_climate leap weeks/days
dsweber2 Feb 27, 2025
b3db007
same for climatological_forecaster
dsweber2 Feb 27, 2025
fda832f
feb 29 is day 60, so the boundary is 59-60
dsweber2 Feb 27, 2025
49882f2
Dan's feedback, < a full year, test monthly agg
dsweber2 Mar 3, 2025
6424f39
pipe R dependency
dsweber2 Mar 5, 2025
7c9d1c6
-1 -> 999
dsweber2 Mar 5, 2025
7e10b52
Merge pull request #445 from cmu-delphi/leap_time_handling
dsweber2 Mar 11, 2025
03d50b2
version bump, minor doc update
dsweber2 Mar 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ inst/doc
.Rprofile
renv.lock
renv/
.Renviron
5 changes: 2 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: epipredict
Title: Basic epidemiology forecasting methods
Version: 0.1.8
Version: 0.1.9
Authors@R: c(
person("Daniel J.", "McDonald", , "daniel@stat.ubc.ca", role = c("aut", "cre")),
person("Ryan", "Tibshirani", , "ryantibs@cmu.edu", role = "aut"),
Expand Down Expand Up @@ -38,6 +38,7 @@ Imports:
glue,
hardhat (>= 1.3.0),
lifecycle,
lubridate,
magrittr,
recipes (>= 1.0.4),
rlang (>= 1.1.0),
Expand All @@ -54,7 +55,6 @@ Suggests:
fs,
grf,
knitr,
lubridate,
poissonreg,
purrr,
quantreg,
Expand All @@ -73,7 +73,6 @@ Remotes:
cmu-delphi/epidatasets,
cmu-delphi/epidatr,
cmu-delphi/epiprocess,
cmu-delphi/epidatasets,
dajmcdon/smoothqr
Config/Needs/website: cmu-delphi/delphidocs
Config/testthat/edition: 3
Expand Down
6 changes: 6 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ S3method(autoplot,epi_workflow)
S3method(bake,check_enough_train_data)
S3method(bake,epi_recipe)
S3method(bake,step_adjust_latency)
S3method(bake,step_climate)
S3method(bake,step_epi_ahead)
S3method(bake,step_epi_lag)
S3method(bake,step_epi_slide)
Expand Down Expand Up @@ -59,6 +60,7 @@ S3method(predict,flatline)
S3method(prep,check_enough_train_data)
S3method(prep,epi_recipe)
S3method(prep,step_adjust_latency)
S3method(prep,step_climate)
S3method(prep,step_epi_ahead)
S3method(prep,step_epi_lag)
S3method(prep,step_epi_slide)
Expand Down Expand Up @@ -89,6 +91,7 @@ S3method(print,layer_residual_quantiles)
S3method(print,layer_threshold)
S3method(print,layer_unnest)
S3method(print,step_adjust_latency)
S3method(print,step_climate)
S3method(print,step_epi_ahead)
S3method(print,step_epi_lag)
S3method(print,step_epi_slide)
Expand Down Expand Up @@ -150,6 +153,8 @@ export(cdc_baseline_args_list)
export(cdc_baseline_forecaster)
export(check_enough_train_data)
export(clean_f_name)
export(climate_args_list)
export(climatological_forecaster)
export(default_epi_recipe_blueprint)
export(detect_layer)
export(dist_quantiles)
Expand Down Expand Up @@ -198,6 +203,7 @@ export(remove_model)
export(slather)
export(smooth_quantile_reg)
export(step_adjust_latency)
export(step_climate)
export(step_epi_ahead)
export(step_epi_lag)
export(step_epi_naomit)
Expand Down
5 changes: 4 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
`data(<dataset name>)`, but can be accessed with
`data(<dataset name>, package = "epidatasets")`, `epidatasets::<dataset name>`
or, after loading the package, the name of the dataset alone (#382).
- `step_adjust_latency()` no longer allows empty column selection.

## Improvements

Expand All @@ -19,9 +20,11 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicat
- Make key column inference more consistent within the package and with current `epiprocess`.
- Fix `quantile_reg()` producing error when asked to output just median-level predictions.
- (temporary) ahead negative is allowed for `step_epi_ahead` until we have `step_epi_shift`
- Add `reference_date` as an argument to `epi_recipe()`
- Add `step_climate()` to create "climate" predictor in forecast workflows
- Add `climatological_forecaster()` to automatically create climate baselines

## Bug fixes

- Shifting no columns results in no error for either `step_epi_ahead` and `step_epi_lag`
- Quantiles produced by `grf` were sometimes out of order.
- dist_quantiles can have all `NA` values without causing unrelated errors
Expand Down
3 changes: 2 additions & 1 deletion R/arx_forecaster.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ arx_forecaster <- function(
#'
#' @return An unfitted `epi_workflow`.
#' @export
#' @seealso [arx_forecaster()]
#' @seealso [arx_forecaster()], [arx_args_list()]
#'
#' @examples
#' library(dplyr)
Expand Down Expand Up @@ -283,6 +283,7 @@ arx_fcast_epi_workflow <- function(
#'
#' @return A list containing updated parameter choices with class `arx_flist`.
#' @export
#' @seealso [arx_forecaster()]
#'
#' @examples
#' arx_args_list()
Expand Down
9 changes: 5 additions & 4 deletions R/autoplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,13 @@ autoplot.epi_workflow <- function(
shift <- -as.numeric(old_name_y[2])
new_name_y <- paste(old_name_y[-c(1:2)], collapse = "_")
edf <- rename(edf, !!new_name_y := !!names(y))
} else {
new_name_y <- names(y)
shift <- 0
}

if (!is.null(shift)) {
edf <- mutate(edf, time_value = time_value + shift)
}
other_keys <- setdiff(key_colnames(object), c("geo_value", "time_value"))
edf <- mutate(edf, time_value = time_value + shift)
other_keys <- key_colnames(object, exclude = c("geo_value", "time_value"))
edf <- as_epi_df(edf,
as_of = object$fit$meta$as_of,
other_keys = other_keys
Expand Down
242 changes: 242 additions & 0 deletions R/climatological_forecaster.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
#' Climatological forecaster
#'
#' This is another "baseline" type forecaster, but it is especially appropriate
#' for strongly seasonal diseases (e.g., influenza). The idea is to predict
#' the "typical season" by summarizing over all available history in the
#' `epi_data`. This is analogous to a "climate" forecast rather than a "weather"
#' forecast, essentially predicting "typical January" behavior by relying on a
#' long history of such periods rather than heavily using recent data.
#'
#' The forecast is taken as the quantiles of the `outcome` in a small window
#' around the target period, computed over the entire available history.
#'
#' @inheritParams flatline_forecaster
#' @param args_list A list of additional arguments as created by the
#' [climate_args_list()] constructor function.
#'
#' @return A data frame of point and interval) forecasts at a all horizons
#' for each unique combination of `key_vars`.
#' @export
#' @seealso [step_climate()]
#'
#' @examples
#' rates <- cases_deaths_subset
#' # set as_of to the last day in the data
#' attr(rates, "metadata")$as_of <- as.Date("2021-12-31")
#' fcast <- climatological_forecaster(rates, "case_rate_7d_av")
#' autoplot(fcast)
#'
#' # Compute quantiles separately by location, and a backcast
#' backcast <- climatological_forecaster(
#' rates, "case_rate_7d_av",
#' climate_args_list(
#' quantile_by_key = "geo_value",
#' forecast_date = as.Date("2021-06-01")
#' )
#' )
#' autoplot(backcast)
#'
#' # compute the climate "daily" rather than "weekly"
#' # use a two week window (on both sides)
#' daily_fcast <- climatological_forecaster(
#' rates, "case_rate_7d_av",
#' climate_args_list(
#' time_type = "day", window_size = 14L, forecast_horizon = 0:30
#' )
#' )
#' autoplot(daily_fcast) +
#' ggplot2::coord_cartesian(xlim = c(as.Date("2021-10-01"), NA))
climatological_forecaster <- function(epi_data,
outcome,
args_list = climate_args_list()) {
if (!is_epi_df(epi_data)) {
cli_abort(
"`epi_data` must be an {.cls epi_df}, not a {.cls {class(epi_data)}}."
)
}
edf_time_type <- attr(epi_data, "metadata")$time_type
if (edf_time_type == "custom") {
cli_abort("This forecaster only works with daily, weekly, or yearmonth data.")
}
if (!inherits(args_list, c("climate_fcast", "alist"))) {
cli_abort("`args_list` was not created using `climate_args_list()`.")
}
arg_is_chr_scalar(outcome)
hardhat::check_column_names(epi_data, c(outcome, args_list$quantile_by_key))
forecast_date <- args_list$forecast_date %||% attr(epi_data, "metadata")$as_of
horizon <- args_list$forecast_horizon
window_size <- args_list$window_size
time_type <- args_list$time_type
ttype_ord <- match(time_type, c("day", "epiweek", "week", "month"))
ttype_ord <- ttype_ord - as.integer(ttype_ord > 2)
edf_ttype_ord <- match(edf_time_type, c("day", "week", "yearmonth"))
if (ttype_ord < edf_ttype_ord) {
cli_abort(c("Climate forecasts for more granular time types are not possible
if the `epi_data` has a higher level of aggregation",
i = "Here, the data is in {.val {edf_time_type}}s while
`time_type` is {.val {time_type}}."
))
}
# process the time types
sym_outcome <- sym(outcome)
epi_data <- epi_data %>%
filter(!is.na(!!outcome)) %>%
select(all_of(c(key_colnames(epi_data), outcome)))
if (time_type %in% c("week", "epiweek")) {
ttype_dur <- lubridate::weeks
time_aggr <- ifelse(time_type == "week", lubridate::epiweek, lubridate::isoweek)
modulus <- 53L
} else if (time_type == "month") {
ttype_dur <- function(x) lubridate::period(month = x)
time_aggr <- lubridate::month
modulus <- 12L
} else if (time_type == "day") {
ttype_dur <- lubridate::days
time_aggr <- lubridate::yday
modulus <- 365L
}
center_fn <- switch(args_list$center_method,
mean = function(x, w) mean(x, na.rm = TRUE),
median = function(x, w) stats::median(x, na.rm = TRUE)
)
# get the point predictions
keys <- key_colnames(epi_data, exclude = "time_value")
epi_data <- epi_data %>% mutate(.idx = time_aggr(time_value), .weights = 1)
climate_center <- epi_data %>%
select(.idx, .weights, all_of(c(outcome, keys))) %>%
dplyr::reframe(
roll_modular_multivec(
!!sym_outcome, .idx, .weights, center_fn, window_size,
modulus
),
.by = all_of(keys)
) %>%
rename(.pred = climate_pred)
# get the quantiles
Quantile <- function(x, w) {
x <- x - stats::median(x, na.rm = TRUE)
if (args_list$symmetrize) x <- c(x, -x)
list(unname(quantile(
x,
probs = args_list$quantile_levels, na.rm = TRUE, type = 8
)))
}
climate_quantiles <- epi_data %>%
select(.idx, .weights, all_of(c(outcome, args_list$quantile_by_key))) %>%
dplyr::reframe(
roll_modular_multivec(
!!sym_outcome, .idx, .weights, Quantile, window_size,
modulus
),
.by = all_of(args_list$quantile_by_key)
) %>%
rename(.pred_distn = climate_pred) %>%
mutate(.pred_distn = dist_quantiles(.pred_distn, args_list$quantile_levels))
# combine them together
climate_table <- climate_center %>%
left_join(climate_quantiles, by = c(".idx", args_list$quantile_by_key)) %>%
mutate(.pred_distn = .pred_distn + .pred)
# create the predictions
predictions <- epi_data %>%
select(all_of(keys)) %>%
dplyr::distinct() %>%
mutate(forecast_date = forecast_date, .idx = time_aggr(forecast_date))
predictions <- map(horizon, ~ {
predictions %>%
mutate(.idx = .idx + .x, target_date = forecast_date + ttype_dur(.x))
}) %>%
purrr::list_rbind() %>%
mutate(
.idx = .idx %% modulus,
.idx = dplyr::case_when(.idx == 0 ~ modulus, TRUE ~ .idx)
) %>%
left_join(climate_table, by = c(".idx", keys)) %>%
select(-.idx)
if (args_list$nonneg) {
predictions <- mutate(
predictions,
.pred = snap(.pred, 0, Inf),
.pred_distn = snap(.pred_distn, 0, Inf)
)
}

# fill in some extras for plotting methods, etc.
ewf <- epi_workflow()
ewf$trained <- TRUE
ewf$pre <- list(mold = list(
outcomes = select(epi_data, !!sym_outcome),
extras = list(roles = list(
geo_value = select(epi_data, geo_value),
time_value = select(epi_data, time_value)
))
))
other_keys <- key_colnames(epi_data, exclude = c("time_value", "geo_value"))
if (length(other_keys) > 0) {
ewf$pre$mold$extras$roles$key <- epi_data %>% select(all_of(other_keys))
}

structure(
list(
predictions = predictions,
epi_workflow = ewf,
metadata = list(
training = attr(epi_data, "metadata"),
forecast_created = Sys.time()
)
),
class = c("climate_fcast", "canned_epipred")
)
}

#' Climatological forecaster argument constructor
#'
#' @inheritParams epi_recipe
#' @param forecast_horizon Vector of integers giving the number of time steps,
#' in units of the `time_type`,
#' from the `reference_date` for which predictions should be produced.
#' @inheritParams step_climate
#' @inheritParams flatline_args_list
#'
#' @return A list containing updated parameter choices with class `climate_alist`.
#' @export
#' @seealso [climatological_forecaster()], [step_climate()]
#'
#' @examples
#' climate_args_list()
#' climate_args_list(
#' forecast_horizon = 0:10,
#' quantile_levels = c(.01, .025, 1:19 / 20, .975, .99)
#' )
#'
climate_args_list <- function(
forecast_date = NULL,
forecast_horizon = 0:4,
time_type = c("epiweek", "week", "month", "day"),
center_method = c("median", "mean"),
window_size = 3L,
quantile_levels = c(.05, .1, .25, .5, .75, .9, .95),
symmetrize = FALSE,
nonneg = TRUE,
quantile_by_key = character(0L),
...) {
rlang::check_dots_empty()
time_type <- arg_match(time_type)
center_method <- rlang::arg_match(center_method)
arg_is_scalar(window_size, symmetrize, nonneg)
arg_is_chr(quantile_by_key, allow_empty = TRUE)
arg_is_scalar(forecast_date, allow_null = TRUE)
arg_is_date(forecast_date, allow_null = TRUE)
arg_is_nonneg_int(window_size)
arg_is_int(forecast_horizon)
arg_is_lgl(symmetrize, nonneg)
arg_is_probabilities(quantile_levels)
quantile_levels <- sort(unique(c(0.5, quantile_levels)))

structure(
enlist(
forecast_date, forecast_horizon, time_type, center_method, window_size,
quantile_levels, symmetrize, nonneg, quantile_by_key
),
class = c("climate_fcast", "alist")
)
}
4 changes: 2 additions & 2 deletions R/flatline_forecaster.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#'
#' @param epi_data An [epiprocess::epi_df][epiprocess::as_epi_df]
#' @param outcome A scalar character for the column name we wish to predict.
#' @param args_list A list of dditional arguments as created by the
#' @param args_list A list of additional arguments as created by the
#' [flatline_args_list()] constructor function.
#'
#' @return A data frame of point (and optionally interval) forecasts at a single
Expand All @@ -34,7 +34,7 @@ flatline_forecaster <- function(
args_list = flatline_args_list()) {
validate_forecaster_inputs(epi_data, outcome, "time_value")
if (!inherits(args_list, c("flat_fcast", "alist"))) {
cli_abort("`args_list` was not created using `flatline_args_list().")
cli_abort("`args_list` was not created using `flatline_args_list()`.")
}
keys <- key_colnames(epi_data)
ek <- kill_time_value(keys)
Expand Down
Loading
Loading