This repository contains code, data, and figures that support:
Eskew, E.A., E. Clancey, D. Singh, S. Situma, L. Nyakarahuka, M. K. Njenga, and S. L. Nuismer. In press. Interepidemic Rift Valley fever in East Africa: The recent risk landscape and projected impacts of global change. Proceedings of the Royal Society B.
Models of interepidemic Rift Valley fever (RVF) relied on a suite of spatially-explicit predictor variables. All predictors were processed to a resolution of 2.5 arcminutes, but here we provide details about the sourcing and native resolution of all predictors:
-
Hydrology
-
Lake data from HydroLAKES (shapefile of lakes globally)
-
River data from HydroRIVERS (shapefile of rivers globally)
-
-
Soils
- Multiple variables from SoilGrids (250 m resolution [~8 arcsecond resolution])
-
Topography
-
Elevation data from the Shuttle Radar Topography Mission (SRTM) (1 arcsecond resolution)
-
Slope was calculated using the elevation data described above
-
-
Disease detection
- Travel time to healthcare data from Weiss et al. 2020, Nature Medicine (30 arcsecond resolution)
-
Livestock density
- Cattle, goat, and sheep density data from Gridded Livestock of the World (version 4) (5 arcminute resolution)
-
Human population density
-
Historical human population data from WorldPop (30 arcsecond resolution)
-
Projected human population data from Wang et al. 2022, Scientific Data (30 arcsecond resolution)
-
-
Precipitation and temperature
To help explain the project scripts, the overall workflow is as follows:
get_SoilGrids_data.Rprogrammatically downloads the soil predictor data. All other predictor data were manually downloaded from the online resources described above.
-
process_all_predictors.Rprocesses all predictor data into rasters of 2.5 arcminute resolution. This script calls the variousprocess_*_data.Rscripts that each handle a certain type of predictor data. Note that these scripts do need to be called in the order prescribed byprocess_all_predictors.Rso that intermediate files are available, as needed. -
generate_predictor_flat_files.Rtakes the 2.5 arcminute raster predictor files and generates flat CSV files describing the predictor data for each grid cell across the study region. Predictor data in this format are necessary for downstream modeling. Note that these flat predictor files are generated for both historical and future climate conditions.
-
prep_outbreak_data.Rprepares the raw outbreak data for use in interepidemic RVF modeling. Generates two versions of the outbreak data, one with missing location coordinates filled with administrative unit centroids (outbreak_data_centroid_filled.csv) and one with replicated outbreak data that are randomly filled from the known administrative unit areas (outbreak_data_randomly_filled.csv). -
generate_absence_data.Rgenerates the background (i.e., pseudo-absence) data for use in interepidemic RVF modeling. Produces theoutbreak_data_*_pseudoabsences.csvfiles in the data/outbreak_data subdirectory. -
extract_outbreak_absence_predictors.Ruses the predictor flat files to generate a data frame with predictor data for all observed interepidemic RVF outbreak events as well as the background points. Produces theoutbreak_data_*_pseudoabsences_predictors.csvfiles in the data/outbreak_data subdirectory.
fit_model.Rfits and saves an XGBoost model of the disease outbreak and background data. These objects are saved in the data/saved_objects subdirectory.
-
model_postprocessing.Ruses the saved XGBoost model objects to generate ROC curve, variable importance, and partial dependence plots. Also calculates the cutoff value that maximizes the true skill statistic (TSS) for use in downstream analyses. -
generate_prediction_rasters.Ruses the saved XGBoost model objects to generate prediction rasters showing the relative likelihood of RVF across the study region. These prediction rasters are generated for all months of the calendar year using predictor data describing historical climate (1970-2000), historical weather (2008-2022), and future climate conditions. Summary data written to prediction_raster_summary.csv. -
model_validation.Restimates grid cell-level RVFV force of infection (FOI) and combines these estimates with RVF relative likelihood values from the prediction rasters to validate our model's predictive ability. Also generates the accompanying figure. Data written to serology_data_for_validation.csv. -
calculate_pop_at_risk.Rcombines predicted RVF relative likelihood values from the prediction rasters with estimates of future human population density to calculate the future population at risk. Estimates written to human_pop_at_risk.csv.
-
plot_outbreak_data.Rgenerates figures showing the distribution of observed interepidemic RVF outbreak events. -
plot_background_points.Rgenerates a single figure showing the background points used in XGBoost modeling. -
plot_main.Ris the project's primary plotting script. -
plot_deltas.Rgenerates monthly-level figures showing change over time in precipitation and temperature variables as well as model-based predictions. -
plot_gif.Rgenerates a GIF of monthly predictions from 2008-2022.