Description
cdc_baseline_forcecaster
can successfully reproduce historical Flusight-baseline forecasts from last season; see this script. However, some extra features in cdc_baseline_forecaster
that aren't needed to produce Flusight-baseline forecasts may need some extra review.
[also review claims re. covid if they still exist]
Rough points to revisit; some probably don't actually apply
-
check if runs when doing geo pooling
-
.$ in epislide… prefer .x$? or fine?
-
locale-independent Saturday check?
-
styler
-
.data$ / .env$ to calm checks? or fn Nat used in epiprocess?
-
why !!outcome?
-
why predictor on keys? could this be follow "id variable" example in `?addrole`?
-
(1groupby(across(……))1 vs. 1groupby(pick(…….))1? deprecation planned in future https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/. but for compatibility better to keep around??)
-
missing a stepepinaomit for the training window? (but what about test data selection?)
-
side
- `gettestdata `arg validation & fixup a little weird looking (might be able to combine some, `allownull=TRUE` inside non-NULL branch, -Inf thing weird, class != class)
- ~ min(.x$lag %||% Inf) — why min? also, mapping across all steps…. need to remember this if doing archive-based recipes b/c we may not want this
- check… max lags & max horizon also might not be archive-backcast-compatible unless already doing transform to epi`df
- what??`?
if (is.null(n_recent)) n_recent <- min_required + 1 # one extra for filling
if (n_recent <= min_required) n_recent <- min_required + n_recent -
appears to be flatline + iterated (as if independent) symmetrized 1-week
differences, separately for each geo (w/ no time window, no transformation) -
no need for `if (argslist$nonneg) f <- layerthreshold(f, ".pred")`? or does it need to be before `cdcflatlinequantiles`? or not?
-
what type of warning are we trying to suppress with suppressWarnings? be more selective?
-
incomplete propagate test
-
major
- `datafrequenc`y not considered in layer?
- check on hhs… we don't want filling through forecastdate
- nsims much smaller?
- do we really want warning + something different rather than error when
`bykey` cols aren't available? also, the warnings don't trigger?? probably
just casualty of suppressWarnings - no clue about the reasoning here
`nafillbuffer`: At predict time, recent values of the training data are
used to create a forecast. However, these can be 'NA' due to,
e.g., data latency issues. By default, any missing values
will get filled with less recent data. Setting this value to
'NULL' will result in 1 extra recent row (beyond those
required for lag creation) to be used. Note that we require
at least 'min(lags)' rows of recent data per '`geovalue`' to
create a prediction. For this reason, setting 'nafillbuffer
< min(lags)' will be treated as additional allowed recent
data rather than the total amount of recent data to examine.
-
semimajor
- assume this doesn't actually work with `timetype` = week?
- data frequency was not 1 week for covid-19 forecasts
- need to do a `stepepilag` 1? datafrequency? to get the right training window selection?
- if there are gaps, are deltas appropriately NA?
- with hhs latency, `forecastdate` & `targetdate` setting is awkward
- for non-flatline, had issue with contrasts on 1 geo and with residuals not matching size on mult geos. maybe missing `stepepinaomit`? but adding `stepepinaomit` gives `Warning: Values from `q` are not uniquely identified; output will contain list-cols.`
-
awkward…:
if (max_ahead > 1L) {
for (iter in 2:max_ahead) { -
filter to Saturdays & no timetype update…
-
`epiprocess::guessperiod` useful?
-
`statecensus$fips` should be chr
-
maybe avoid `sample` due to length-1-numeric case? unlikely to encounter but bad…