Further review on `cdc_baseline_forecaster`

`cdc_baseline_forcecaster` can successfully reproduce historical Flusight-baseline forecasts from last season; see [this script](https://github.com/cmu-delphi/forecasting-team-notebooks/blob/main/flusight-baseline/flusight-baseline-2023-24.R). However, some extra features in `cdc_baseline_forecaster` that aren't needed to produce Flusight-baseline forecasts may need some extra review.

[also review claims re. covid if they still exist]

<details> <summary> Rough points to revisit; some probably don't actually apply </summary>

- check if runs when doing geo pooling
- .$ in epislide&#x2026; prefer .x$? or fine?
- locale-independent Saturday check?
- styler
- .data$ / .env$ to calm checks? or fn Nat used in epiprocess?
- why !!outcome?
- why predictor on keys? could this be follow "id variable" example in \`?addrole\`?
- (1groupby(across(&#x2026;&#x2026;))1 vs. 1groupby(pick(&#x2026;&#x2026;.))1? deprecation planned in future <https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/>. but for compatibility better to keep around??)
- missing a stepepinaomit for the training window? (but what about test data selection?)
- side
 - \`gettestdata \`arg validation & fixup a little weird looking (might be able to combine some, \`allownull=TRUE\` inside non-NULL branch, -Inf thing weird, class != class)
 - ~ min(.x$lag %||% Inf) &#x2014; why min? also, mapping across all steps&#x2026;. need to remember this if doing archive-based recipes b/c we may not want this
 - check&#x2026; max lags & max horizon also might not be archive-backcast-compatible unless already doing transform to epi\`df
 - what??\`?

 if (is.null(n_recent)) n_recent <- min_required + 1 # one extra for filling
 if (n_recent <= min_required) n_recent <- min_required + n_recent

- appears to be flatline + iterated (as if independent) symmetrized 1-week
 differences, separately for each geo (w/ no time window, no transformation)
- no need for \`if (argslist$nonneg) f <- layerthreshold(f, ".pred")\`? or does it need to be before \`cdcflatlinequantiles\`? or not?
- what type of warning are we trying to suppress with suppressWarnings? be more selective?
- incomplete propagate test
- major
 - \`datafrequenc\`y not considered in layer?
 - check on hhs&#x2026; we don't want filling through forecastdate
 - nsims much smaller?
 - do we really want warning + something different rather than error when
 \`bykey\` cols aren't available? also, the warnings don't trigger?? probably
 just casualty of suppressWarnings
 - no clue about the reasoning here

> \`nafillbuffer\`: At predict time, recent values of the training data are
> used to create a forecast. However, these can be 'NA' due to,
> e.g., data latency issues. By default, any missing values
> will get filled with less recent data. Setting this value to
> 'NULL' will result in 1 extra recent row (beyond those
> required for lag creation) to be used. Note that we require
> at least 'min(lags)' rows of recent data per '\`geovalue\`' to
> create a prediction. For this reason, setting 'nafillbuffer
> < min(lags)' will be treated as additional allowed recent
> data rather than the total amount of recent data to examine.

- semimajor
 - assume this doesn't actually work with \`timetype\` = week?
 - data frequency was not 1 week for covid-19 forecasts
 - need to do a \`stepepilag\` 1? datafrequency? to get the right training window selection?
 - if there are gaps, are deltas appropriately NA?
 - with hhs latency, \`forecastdate\` & \`targetdate\` setting is awkward
 - for non-flatline, had issue with contrasts on 1 geo and with residuals not matching size on mult geos. maybe missing \`stepepinaomit\`? but adding \`stepepinaomit\` gives \`Warning: Values from \`q\` are not uniquely identified; output will contain list-cols.\`
- awkward&#x2026;:

 if (max_ahead > 1L) {
 for (iter in 2:max_ahead) {

- filter to Saturdays & no timetype update&#x2026;
- \`epiprocess::guessperiod\` useful?
- \`statecensus$fips\` should be chr
- maybe avoid \`sample\` due to length-1-numeric case? unlikely to encounter but bad&#x2026;


</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Further review on `cdc_baseline_forecaster` #250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Further review on cdc_baseline_forecaster #250

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Further review on `cdc_baseline_forecaster` #250