Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a table to summarize the expected data format #140

Merged
merged 1 commit into from
Sep 28, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion vignettes/data_preprocessing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,14 @@ utils::head(covid19_data)
The data frame contains 23 columns. `MSOA11CD` represents the spatial identifier for each data observation. Variable `cases` is the response variable, which is the weekly reported number of COVID-19 cases in each of the 6789 MSOAs in main England over the period from 2022-01-01 to 2022-03-26. Variable `date` indicates the start date of each observation week when the COVID-19 infections data for each MSOA were reported. Variable `week` indicates the week index number that each data observation was collected from. Columns `LONG` and `LAT` indicate the longitude and latitude for each MSOA. Variable `Population` indicates the population size for each MSOA. The remaining columns store the data for each covariate in each MSOA and week.


Therefore, the expected observation and measurement data format for a spatio-temporal Bayesian hierarchical model as in the COVID-19 tutorial should be a data frame that includes one column for the response data (e.g., `cases`), two columns for the spatial location of each observation (e.g., `LONG` and `LAT`), and one column containing time point indices indicating when each observation was collected (e.g., `week` = 1, 2, ...). If the model incorporates covariates, then the covariate data should also be included in the same data frame, and each covariate is stored in one column. Users can use any variable names for the columns, as long as they ensure consistency with those used when defining the model formula and fitting the model.
Therefore, the expected observation and measurement data format for a spatio-temporal Bayesian hierarchical model as in the COVID-19 tutorial should be a data frame that includes one column for the response variable (e.g., `cases`), two columns for the spatial location of each observation (e.g., `LONG` and `LAT`), and one column containing time point indices indicating when each observation was collected (e.g., `week` = 1, 2, ...). If the model incorporates covariates, then the covariate data should also be included in the same data frame, and each covariate is stored in one column. Users can use any variable names for the columns, as long as they ensure consistency with those used when defining the model formula and fitting the model. The following table provides a summary of the expected data format for running the BHM in the `fdmr` package:


| ID | LONG | LAT | Time | Response Variable| Covariate 1 | Covariate 2 | Covariate...
| --- | --- | --- |---|---|--- |---|---|
| 1 | ... | ... | ... | ... | ... | ... | ...|
| 2 | ... | ... | ... | ... | ... | ... | ...|
| ... | ... | ... | ... | ... | ... | ... | ...|


With `sp_data` and `covid19_data` in the expected data object and format, we now possess all the essential information required for the fitting the BHM and visualising the results. More details regarding the model fitting process can be found at in the [COVID-19 tutorial](https://4dmodeller.github.io/fdmr/articles/covid.html).
Loading