Skip to content

Docs overhaul #431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
Jun 23, 2025
Merged
Changes from 1 commit
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
5c5c87c
second readthrough of README.Rmd
dsweber2 Jan 23, 2025
007e2b4
make for manually clearing the caches
dsweber2 Jan 23, 2025
2b5908c
styler
dsweber2 Jan 23, 2025
e47b5d7
missed an image in the man version
dsweber2 Jan 23, 2025
f8b43c4
getting started page
dsweber2 Jan 24, 2025
77cbeb9
fixing rebase problem
dsweber2 Jan 24, 2025
1539552
linewrapping
dsweber2 Jan 24, 2025
91a980b
pushing only the dev docs
dsweber2 Jan 24, 2025
a52934a
isn't building the readme
dsweber2 Jan 24, 2025
06d5581
readme.rmd red/yellow -> blue/black
dsweber2 Jan 24, 2025
eed91fa
training on only the shown subset
dsweber2 Jan 27, 2025
39e203c
autoplot new data
dsweber2 Jan 27, 2025
0be48a0
using new autoplot
dsweber2 Jan 27, 2025
6e3d2ff
getting started first draft
dsweber2 Jan 31, 2025
c7f5d3b
much more complete guts example, branching flatline fixes
dsweber2 Feb 7, 2025
7d1273e
fix for flatline discovered, rename guts
dsweber2 Feb 7, 2025
2054f9c
docs, styler
dsweber2 Feb 10, 2025
763885e
passing check & news
dsweber2 Feb 10, 2025
ad8bf93
revising custom_epiworkflows
dsweber2 Feb 10, 2025
5177e68
some more editing
dsweber2 Feb 11, 2025
a5270cc
finished custom_workflows, reviewing backtesting
dsweber2 Feb 11, 2025
563f185
backtesting rmd rewrite
dsweber2 Feb 25, 2025
9ac8c82
dropping CAN backtesting example b/c ~no revisions
dsweber2 Feb 25, 2025
4f0c7e6
formatting
dsweber2 Feb 25, 2025
a552e14
|> in backtesting, dropped a section in get started
dsweber2 Feb 25, 2025
decc963
landing page wording and get code running
nmdefries Feb 28, 2025
ad236b6
landing page again but in Rmd
nmdefries Mar 1, 2025
9a8fc7c
consistent naming, 7dav pull instead of manually
dsweber2 Mar 3, 2025
97ded60
going back to just using the API call
dsweber2 Mar 4, 2025
2a29a6b
recipes version, include epiprocess in the rmds
dsweber2 Mar 4, 2025
bd72da5
rebuild landing page
nmdefries Mar 4, 2025
e638288
first half epipredict.Rmd
nmdefries Mar 4, 2025
93a9e8c
custom header, dropping arx_classifier smooth-qr
dsweber2 Mar 4, 2025
2f4fa89
follow up on first half of epipredict.Rmd
dsweber2 Mar 5, 2025
5003736
avoid [link] parsing
dsweber2 Mar 5, 2025
28063be
reorganize reference page
dsweber2 Mar 5, 2025
c2bc4ba
postprocessing -> post-processing
dsweber2 Mar 5, 2025
34d7a05
lots of reference updates
dsweber2 Mar 6, 2025
b61a30b
forecast needs `...` as a generic
dsweber2 Mar 7, 2025
ea276f3
include climate, only calculate necessary days
dsweber2 Mar 14, 2025
c3287d2
Adding short blurb on cdc_flatline
dsweber2 Mar 18, 2025
01f1d22
extra details for symmetrize
dsweber2 Mar 18, 2025
84ed412
epipredict.Rmd
nmdefries Mar 28, 2025
9bbf7bf
backtesting.rmd
nmdefries Apr 3, 2025
bb4025c
first half custom_epiworkflows.Rmd
nmdefries Apr 4, 2025
05f1507
second half custom_epiworkflows.Rmd
nmdefries Apr 7, 2025
c9361ae
various requested changes
dsweber2 Apr 9, 2025
cdf3730
backtesting version un/faithful clarification
nmdefries Apr 9, 2025
7304110
why comparing to final data
nmdefries Apr 9, 2025
4df2535
fixing backtest truth data plot
dsweber2 Apr 9, 2025
3aad6b8
backtesting.rmd comment fixes
nmdefries Apr 9, 2025
5469dda
add alternate step names and say if optional/not
nmdefries Apr 9, 2025
b481a28
get_test_data help
nmdefries Apr 10, 2025
764b7f9
get_test_data forecasts identical
nmdefries Apr 10, 2025
c716235
clarify changing frosting with model
nmdefries Apr 10, 2025
42b146d
classifier chunk comments
nmdefries Apr 10, 2025
ef99e42
model-specific layers
nmdefries Apr 10, 2025
a958343
removing resolved todos
dsweber2 Apr 15, 2025
2056e0a
dan's simple suggestions
dsweber2 May 2, 2025
ef1fd58
move pkgdown-watch, better climate ex, some wording
dsweber2 May 2, 2025
4a9f43e
moving library, geo-pooling phrasing
dsweber2 May 2, 2025
9f0af0a
fit -> estimate
dsweber2 May 13, 2025
c342680
recommended edits
dsweber2 Jun 16, 2025
f6a311e
lightswitch customization is hard
dsweber2 Jun 20, 2025
6047d4f
final pass
dsweber2 Jun 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
training on only the shown subset
  • Loading branch information
dsweber2 committed Apr 10, 2025
commit eed91fad7096cbec8e9f65b52bdcf0154f09968e
72 changes: 41 additions & 31 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ data.
<details>
<summary> Creating the dataset using `{epidatr}` and `{epiprocess}` </summary>

This dataset can be found in the package as <TODO DOESN'T EXIST>; we demonstrate
some of the typically ubiquitous cleaning operations needed to be able to
forecast.
This dataset can be found in the package as `covid_case_death_rates`; we
demonstrate some of the typically ubiquitous cleaning operations needed to be
able to forecast.
First we pull both jhu-csse cases and deaths from
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:

Expand All @@ -152,26 +152,34 @@ deaths <- pub_covidcast(
geo_values = "*"
) |>
select(geo_value, time_value, death_rate = value)
```

Since visualizing the results on every geography is somewhat overwhelming,
we'll only train on a subset of 5.
```{r date, warning = FALSE}
used_locations <- c("ca", "ma", "ny", "tx")
cases_deaths <-
full_join(cases, deaths, by = c("time_value", "geo_value")) |>
filter(geo_value %in% used_locations) |>
as_epi_df(as_of = as.Date("2022-01-01"))
plot_locations <- c("ca", "ma", "ny", "tx")
# plotting the data as it was downloaded
cases_deaths |>
filter(geo_value %in% plot_locations) |>
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
ggplot(aes(x = time_value, y = value)) +
geom_line() +
facet_grid(source ~ geo_value, scale = "free") +
autoplot(
case_rate,
death_rate,
.color_by = "none"
) +
facet_grid(.response_name ~ geo_value, scale = "free") +
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
```

As with basically any dataset, there is some cleaning that we will need to do to
make it actually usable; we'll use some utilities from
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this. First, to
eliminate some of the noise coming from daily reporting, we do 7 day averaging
over a trailing window[^1]:
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.

First, to eliminate some of the noise coming from daily reporting, we do 7 day
averaging over a trailing window[^1]:

[^1]: This makes it so that any given day of the processed timeseries only
depends on the previous week, which means that we avoid leaking future
Expand Down Expand Up @@ -199,10 +207,12 @@ cases_deaths <-
group_by(geo_value) |>
mutate(
outlr_death_rate = detect_outlr_rm(
time_value, death_rate, detect_negatives = TRUE
time_value, death_rate,
detect_negatives = TRUE
),
outlr_case_rate = detect_outlr_rm(
time_value, case_rate, detect_negatives = TRUE
time_value, case_rate,
detect_negatives = TRUE
)
) |>
unnest(cols = starts_with("outlr"), names_sep = "_") |>
Expand All @@ -212,7 +222,6 @@ cases_deaths <-
case_rate = outlr_case_rate_replacement
) |>
select(geo_value, time_value, case_rate, death_rate)
cases_deaths
```
</details>

Expand All @@ -224,14 +233,13 @@ of the states, noting the actual forecast date:
```{r plot_locs}
forecast_date_label <-
tibble(
geo_value = rep(plot_locations, 2),
source = c(rep("case_rate", 4), rep("death_rate", 4)),
dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations)),
geo_value = rep(used_locations, 2),
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
heights = c(rep(150, 4), rep(1.0, 4))
)
processed_data_plot <-
cases_deaths |>
filter(geo_value %in% plot_locations) |>
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
ggplot(aes(x = time_value, y = value)) +
geom_line() +
Expand Down Expand Up @@ -292,36 +300,37 @@ data narrowed somewhat
narrow_data_plot <-
cases_deaths |>
filter(time_value > "2021-04-01") |>
filter(geo_value %in% plot_locations) |>
pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
ggplot(aes(x = time_value, y = value)) +
geom_line() +
facet_grid(source ~ geo_value, scale = "free") +
autoplot(
case_rate,
death_rate,
.color_by = "none"
) +
facet_grid(.response_name ~ geo_value, scale = "free") +
geom_vline(aes(xintercept = forecast_date)) +
geom_text(
data = forecast_date_label,
aes(x = dates, label = "forecast\ndate", y = heights),
size = 3, hjust = "right"
) +
scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ylim(0, NA)
```

Putting that together with a plot of the bands, and a plot of the median
prediction.

```{r plotting_forecast, warning=FALSE}
epiworkflow <- four_week_ahead$epi_workflow

restricted_predictions <-
four_week_ahead$predictions |>
filter(geo_value %in% plot_locations) |>
rename(time_value = target_date, value = .pred) |>
mutate(source = "death_rate")
mutate(.response_name = "death_rate")
forecast_plot <-
narrow_data_plot |>
epipredict:::plot_bands(
restricted_predictions,
levels = 0.9
restricted_predictions
) +
geom_point(
data = restricted_predictions,
Expand Down Expand Up @@ -351,5 +360,6 @@ A couple of things to note:
If you encounter a bug or have a feature request, feel free to file an [issue on
our github page](https://github.com/cmu-delphi/epipredict/issues).
For other
questions, feel free to contact [Daniel](daniel@stat.ubc.ca), [David](davidweb@andrew.cmu.edu), [Dmitry](dshemetov@cmu.edu), or
[Logan](lcbrooks@andrew.cmu.edu), either via email or on the Insightnet slack.
questions, feel free to reach out to the authors, either via this [contact
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
email or the Insightnet slack.