Skip to content

Commit 77a684f

Browse files
committed
feat: update for epiprocess R6 refactor
* remove references to R6 and mutation * use epiprocess correctly * fix the authors section of DESCRIPTION * upgrade renv * update all packages in renv * integrate Rprofile with user Rprofile
1 parent 9367e63 commit 77a684f

File tree

6 files changed

+769
-767
lines changed

6 files changed

+769
-767
lines changed

.Rprofile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
11
source("renv/activate.R")
2+
3+
# Check if user .Rprofile exists
4+
if (file.exists("~/.Rprofile")) {
5+
# Source user .Rprofile
6+
source("~/.Rprofile")
7+
}

DESCRIPTION

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@ Package: delphitoolingbook
22
Title: Delphi Tooling
33
Version: 0.0.0.9999
44
Authors@R: c(
5-
person("Daniel", "McDonald", "J.", "daniel@stat.ubc.ca", role = c("cre", "aut"),
6-
person("Logan", "Brooks", role = c("cre","aut"),
7-
person("Rachel", "Lobay", role = "aut"))
8-
person("Ryan", "Tibshirani", "J.", "ryantibs@berkeley.edu", role = "aut"),
9-
Description:
5+
person("Daniel", "McDonald", "J.", "daniel@stat.ubc.ca", role = c("cre", "aut")),
6+
person("Logan", "Brooks", role = c("cre","aut")),
7+
person("Rachel", "Lobay", role = "aut"),
8+
person("Ryan", "Tibshirani", "J.", "ryantibs@berkeley.edu", role = "aut")
9+
)
10+
Description:
1011
| This book is a longform introduction to analysing and forecasting epidemiological data.
1112
License: MIT + file LICENSE
1213
Imports:

archive.qmd

Lines changed: 30 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,8 @@ source("_common.R")
2525

2626
## Getting data into `epi_archive` format
2727

28-
An `epi_archive` object
29-
can be constructed from a data frame, data table, or tibble, provided that it
30-
has (at least) the following columns:
28+
An `epi_archive` object can be constructed from a data frame, data table, or
29+
tibble, provided that it has (at least) the following columns:
3130

3231
* `geo_value`: the geographic value associated with each row of measurements.
3332
* `time_value`: the time value associated with each row of measurements.
@@ -55,10 +54,10 @@ class(x)
5554
print(x)
5655
```
5756

58-
An `epi_archive` is special kind of class called an R6 class. Its primary field
59-
is a data table `DT`, which is of class `data.table` (from the `data.table`
60-
package), and has columns `geo_value`, `time_value`, `version`, as well as any
61-
number of additional columns.
57+
An `epi_archive` is an S3 class. Its primary field is a data table `DT`, which
58+
is of class `data.table` (from the `data.table` package), and has columns
59+
`geo_value`, `time_value`, `version`, as well as any number of additional
60+
columns.
6261

6362
```{r}
6463
class(x$DT)
@@ -70,33 +69,18 @@ for the data table, as well as any other specified in the metadata (described
7069
below). There can only be a single row per unique combination of key variables,
7170
and therefore the key variables are critical for figuring out how to generate a
7271
snapshot of data from the archive, as of a given version (also described below).
73-
72+
7473
```{r, error=TRUE}
7574
key(x$DT)
7675
```
77-
78-
In general, the last version of each observation is carried forward (LOCF) to
79-
fill in data between recorded versions. **A word of caution:** R6 objects,
80-
unlike most other objects in R, have reference semantics. An important
81-
consequence of this is that objects are not copied when modified.
82-
83-
```{r}
84-
original_value <- x$DT$percent_cli[1]
85-
y <- x # This DOES NOT make a copy of x
86-
y$DT$percent_cli[1] = 0
87-
head(y$DT)
88-
head(x$DT)
89-
x$DT$percent_cli[1] <- original_value
90-
```
9176

92-
To make a copy, we can use the `clone()` method for an R6 class, as in `y <-
93-
x$clone()`. You can read more about reference semantics in Hadley Wickham's
94-
[Advanced R](https://adv-r.hadley.nz/r6.html#r6-semantics) book.
77+
In general, the last version of each observation is carried forward (LOCF) to
78+
fill in data between recorded versions.
9579

9680
## Some details on metadata
9781

9882
The following pieces of metadata are included as fields in an `epi_archive`
99-
object:
83+
object:
10084

10185
* `geo_type`: the type for the geo values.
10286
* `time_type`: the type for the time values.
@@ -112,10 +96,8 @@ call (as it did in the case above).
11296

11397
A key method of an `epi_archive` class is `as_of()`, which generates a snapshot
11498
of the archive in `epi_df` format. This represents the most up-to-date values of
115-
the signal variables as of a given version. This can be accessed via `x$as_of()`
116-
for an `epi_archive` object `x`, but the package also provides a simple wrapper
117-
function `epix_as_of()` since this is likely a more familiar interface for users
118-
not familiar with R6 (or object-oriented programming).
99+
the signal variables as of a given version. This can be accessed via
100+
`epix_as_of()`.
119101

120102
```{r}
121103
x_snapshot <- epix_as_of(x, max_version = as.Date("2021-06-01"))
@@ -125,7 +107,7 @@ max(x_snapshot$time_value)
125107
attributes(x_snapshot)$metadata$as_of
126108
```
127109

128-
We can see that the max time value in the `epi_df` object `x_snapshot` that was
110+
We can see that the max time value in the `epi_df` object `x_snapshot` that was
129111
generated from the archive is May 29, 2021, even though the specified version
130112
date was June 1, 2021. From this we can infer that the doctor's visits signal
131113
was 2 days latent on June 1. Also, we can see that the metadata in the `epi_df`
@@ -134,7 +116,7 @@ object has the version date recorded in the `as_of` field.
134116
By default, using the maximum of the `version` column in the underlying data table in an
135117
`epi_archive` object itself generates a snapshot of the latest values of signal
136118
variables in the entire archive. The `epix_as_of()` function issues a warning in
137-
this case, since updates to the current version may still come in at a later
119+
this case, since updates to the current version may still come in at a later
138120
point in time, due to various reasons, such as synchronization issues.
139121

140122
```{r}
@@ -143,15 +125,15 @@ x_latest <- epix_as_of(x, max_version = max(x$DT$version))
143125

144126
Below, we pull several snapshots from the archive, spaced one month apart. We
145127
overlay the corresponding signal curves as colored lines, with the version dates
146-
marked by dotted vertical lines, and draw the latest curve in black (from the
128+
marked by dotted vertical lines, and draw the latest curve in black (from the
147129
latest snapshot `x_latest` that the archive can provide).
148130

149131
```{r, fig.width = 8, fig.height = 7}
150132
self_max <- max(x$DT$version)
151133
versions <- seq(as.Date("2020-06-01"), self_max - 1, by = "1 month")
152134
snapshots <- map(
153-
versions,
154-
function(v) {
135+
versions,
136+
function(v) {
155137
epix_as_of(x, max_version = v) %>% mutate(version = v)
156138
}) %>%
157139
list_rbind() %>%
@@ -162,37 +144,35 @@ snapshots <- map(
162144
```{r, fig.height=7}
163145
#| code-fold: true
164146
ggplot(snapshots %>% filter(!latest),
165-
aes(x = time_value, y = percent_cli)) +
166-
geom_line(aes(color = factor(version)), na.rm = TRUE) +
147+
aes(x = time_value, y = percent_cli)) +
148+
geom_line(aes(color = factor(version)), na.rm = TRUE) +
167149
geom_vline(aes(color = factor(version), xintercept = version), lty = 2) +
168150
facet_wrap(~ geo_value, scales = "free_y", ncol = 1) +
169151
scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
170152
scale_color_viridis_d(option = "A", end = .9) +
171-
labs(x = "Date", y = "% of doctor's visits with CLI") +
153+
labs(x = "Date", y = "% of doctor's visits with CLI") +
172154
theme(legend.position = "none") +
173155
geom_line(data = snapshots %>% filter(latest),
174-
aes(x = time_value, y = percent_cli),
156+
aes(x = time_value, y = percent_cli),
175157
inherit.aes = FALSE, color = "black", na.rm = TRUE)
176158
```
177159

178160
We can see some interesting and highly nontrivial revision behavior: at some
179161
points in time the provisional data snapshots grossly underestimate the latest
180162
curve (look in particular at Florida close to the end of 2021), and at others
181-
they overestimate it (both states towards the beginning of 2021), though not
163+
they overestimate it (both states towards the beginning of 2021), though not
182164
quite as dramatically. Modeling the revision process, which is often called
183165
*backfill modeling*, is an important statistical problem in it of itself.
184166

185167

186-
## Merging `epi_archive` objects
168+
## Merging `epi_archive` objects
187169

188170
Now we demonstrate how to merge two `epi_archive` objects together, e.g., so
189171
that grabbing data from multiple sources as of a particular version can be
190-
performed with a single `as_of` call. The `epi_archive` class provides a method
191-
`merge()` precisely for this purpose. The wrapper function is called
192-
`epix_merge()`; this wrapper avoids mutating its inputs, while `x$merge` will
193-
mutate `x`. Below we merge the working `epi_archive` of versioned percentage CLI
194-
from outpatient visits to another one of versioned COVID-19 case reporting data,
195-
which we fetch the from the [COVIDcast
172+
performed with a single `as_of` call. The `epiprocess` packages provides
173+
`epix_merge()` for this purpose. Below we merge the working `epi_archive` of
174+
versioned percentage CLI from outpatient visits to another one of versioned
175+
COVID-19 case reporting data, which we fetch the from the [COVIDcast
196176
API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html/), on the
197177
rate scale (counts per 100,000 people in the population).
198178

@@ -209,7 +189,7 @@ When merging archives, unless the archives have identical data release patterns,
209189
the other).
210190

211191
```{r, message = FALSE, warning = FALSE,eval=FALSE}
212-
# This code is for illustration and doesn't run.
192+
# This code is for illustration and doesn't run.
213193
# The result is saved/loaded in the (hidden) next chunk from `{epidatasets}`
214194
y <- pub_covidcast(
215195
source = "jhu-csse",
@@ -223,24 +203,13 @@ y <- pub_covidcast(
223203
select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>%
224204
as_epi_archive(compactify = TRUE)
225205
226-
x$merge(y, sync = "locf", compactify = FALSE)
206+
x <- epix_merge(x, y, sync = "locf", compactify = FALSE)
227207
print(x)
228208
head(x$DT)
229209
```
230210

231-
```{r, echo=FALSE}
232-
x <- archive_cases_dv_subset
233-
print(x)
234-
head(x$DT)
235-
```
236-
237-
Importantly, see that `x$merge` mutated `x` to hold the result of the merge. We
238-
could also have used `xy = epix_merge(x, y)` to avoid mutating `x`. See the
239-
documentation for either for more detailed descriptions of what mutation,
240-
pointer aliasing, and pointer reseating is possible.
241-
242211
## Sliding version-aware computations
243-
212+
244213
::: {.callout-note}
245214
TODO: need a simple example here.
246215
:::

epiprocess.qmd

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,17 @@ contains the most up-to-date values of the signals variables, as of a given
1515
time.
1616

1717
By convention, functions in the `epiprocess` package that operate on `epi_df`
18-
objects begin with `epi`. For example:
18+
objects begin with `epi`. For example:
1919

2020
- `epi_slide()`, for iteratively applying a custom computation to a variable in
2121
an `epi_df` object over sliding windows in time;
22-
22+
2323
- `epi_cor()`, for computing lagged correlations between variables in an
2424
`epi_df` object, (allowing for grouping by geo value, time value, or any other
2525
variables).
2626

2727
Functions in the package that operate directly on given variables do not begin
28-
with `epi`. For example:
28+
with `epi`. For example:
2929

3030
- `growth_rate()`, for estimating the growth rate of a given signal at given
3131
time values, using various methodologies;
@@ -35,20 +35,18 @@ Functions in the package that operate directly on given variables do not begin
3535

3636
## `epi_archive`: full version history of a data set
3737

38-
The second main data structure in the package is called
39-
[`epi_archive`]. This is a special class (R6 format)
40-
wrapped around a data table that stores the archive (version history) of some
41-
signal variables of interest.
38+
The second main data structure in the package is called [`epi_archive`]. This is
39+
an S3 class containing a data table that stores the archive (version history) of
40+
some signal variables of interest.
4241

4342
By convention, functions in the `epiprocess` package that operate on
4443
`epi_archive` objects begin with `epix` (the "x" is meant to remind you of
45-
"archive"). These are just wrapper functions around the public methods for the
46-
`epi_archive` R6 class. For example:
44+
"archive"). For example:
4745

4846
- `epix_as_of()`, for generating a snapshot in `epi_df` format from the data
4947
archive, which represents the most up-to-date values of the signal variables,
5048
as of the specified version;
51-
49+
5250
- `epix_fill_through_version()`, for filling in some fake version data following
5351
simple rules, for use when downstream methods expect an archive that is more
5452
up-to-date (e.g., if it is a forecasting deadline date and one of our data

0 commit comments

Comments
 (0)