-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
127 lines (103 loc) · 5.41 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# chchpd
The goal of the `chchpd` package is to allow NZBRI researchers to access data
from the New Zealand Parkinson's Progression (NZP<sup>3</sup>) study. Rather
than deal directly with data in spreadsheets or databases, functions like
`import_participants()` and `import_motor_scores()` are provided, to abstract
away dealing with the raw data source. Currently, these access data from
shared online Google Sheets. Some of those are periodically exported from
NZBRI's Alice and REDCap Parkinson's databases, while (for the time being)
some contain manually-entered data. While the data source might change, the
function used to obtain will appear unchanged to the user, which is the
benefit of this layer of abstraction.
Data is returned as a separate dataframe for each data set. Users are
responsible for linking them together as required.
The package was originally developed by Michael MacAskill, with contributions
from Daniel Myall, Toni Pitcher, and Reza Shoorangiz. Please direct queries to
Michael in the first instance.
## Installation
This package is only of interest and utility internally at NZBRI and hence
can't be made available via CRAN. Instead, it is hosted in an online Github
at https://github.com/nzbri/chchpd.
Therefore, the usual installation route using `install.packages('chchpd')` is
not possible, and will yield a not-useful error message that the package is not
available for your version of R. Instead, install `chchpd` from its development
repository on Github as follows:
```{r gh-installation, eval = FALSE}
# install.packages('remotes')
remotes::install_github('nzbri/chchpd')
```
If `install_github('nzbri/chchpd')` is invoked subsequently, the package will be
downloaded and installed only if the version on Github is newer than the one
installed locally.
If problems arise in a new release, you can downgrade to a previous version by
specifying the name of a particular release to revert to, e.g.
```{r gh-revert, eval = FALSE}
devtools::install_github('nzbri/chchpd@v0.1.5')
```
The list of releases is here: https://github.com/nzbri/chchpd/releases
Please use the issue tracker at https://github.com/nzbri/chchpd/issues to report
any problems. Bear in mind that this repository is public to the world, so if
reporting data-related issues, be careful not to post anything that contains
identifiers.
## Example usage
Records across the various tables must be joined using either `subject_id` as
an index (e.g. for linking to the participant table, as it includes information,
such as sex, that is constant for a subject), or `session_id` (to join various
measures gathered at approximately the same assessment session for a given
participant), as these might change at different time points.
The 'glue' between participants and the data gathered about them is generally
the 'sessions' table. For each participant, this lists the various assessment
sessions they have had. In the raw data, these sessions were often described by
idiosyncratic labels, such as `999BIO_F2` (indicating the follow-up session two
years after the baseline recruitment session in the Progression study). But the
same session might also have served as the baseline in the more selective PET study,
and been labelled, say, `999BIO_PET0`. This often idiosyncratic labelling is
cured by the subject session mapping table, which would have a record for both
the `999BIO_F2` and `999BIO_PET0` sessions, linking them to the same
standardised session code, which has a form like `999BIO_2016-01-28`. When
importing various data sources (like HADS or UPDRS), their idiosyncratic session
labels are replaced by this standardised form. Thus it is easy to join multiple
tables together systematically, as below. Often a key step is to specify just a
restricted set of sessions, by specifying a particular study to filter them by.
In the example code below, we select just those sessions in the 'PET' study.
```{r example, eval=FALSE, include=TRUE}
# load necessary packages:
library(chchpd)
library(dplyr) # to join dataframes
# first establish your rights to view the data:
google_authenticate()
# then import datasets of interest:
participants = import_participants()
sessions = import_sessions(from_study = 'PET')
np = import_neuropsyc()
updrs = import_motor_scores()
# bind the records together, linked by subject or session IDs:
dat = right_join(participants, sessions, by = 'subject_id') %>%
left_join(np, by = 'session_id') %>%
left_join(updrs, by = 'session_id')
```
## Data caching
To speed up development of your analysis scripts, data caching is now enabled by
default. That is, it can be quite slow to import data from a Google spreadsheet,
so a copy of the data will be cached for you. If you subsequently attempt to
re-import that data, if it is within a certain interval (60 minutes by default),
then the data won't be downloaded afresh: you will just instantly be returned
the cached version.
If you want to force a fresh download of the data, or change other options, you
can do the following:
```{r options, eval=FALSE, include=TRUE}
options('chchpd_use_cached' = FALSE)
options('chchpd_cache_duration' = 30) # duration in minutes
options('chchpd_suppress_warnings' = FALSE) # reduce warnings from googlesheets
```