These are projections of the annual number of days over temperature thresholds. Projections are based on NARCliM1.5.
The pipeline currently analyses bias-corrected daily maximum temperatures (tasmax-bc
) from the 50 km CORDEX grid, which covers all of Australia, but it can easily configured to analyse other variables (eg. minimum temperatures) or to use the 10 km NARCliM1.5 grid, which covers south-eastern Australia.
- Climate Data Operators
brew install cdo
- R
- GEOS
brew install geos
- (Optional, to download from NCI:)
libssh
- macOS:
brew install libssh
libssh-dev
on Debian/Ubuntu,libssh-devl
on Fedora
- macOS:
You can configure a few prerequisite options by checking the .Rprofile
. Options include:
- Telling R where
cdo
is. Use this if you get errors likecdo: command not found
even though you've already installedcdo
. - Setting a timeout for downloads. The projections are big files, and R's default 1 minute timeout often isn't long enough to download them. In this project it's set to 1 hour by default, but you can increase it if you have particularly slow internet.
- Telling
{targets}
to use a different directory to store results and progress. By default it will create a_targets/
folder in the project. If you need a different location (for example, an external drive with more space), you can set that location here.- You can also move an existing
_targets/
folder and update this option if you run out of space part way through the analysis. - Setting this option creates a
_targets.yaml
in the project folder.
- You can also move an existing
You can further configure how the pipeline runs by altering the top half of _targets.r
, marked by the comment pipeline inputs
. These options include:
data_sources
: where to get data from. Options include one or more of:
dpie
to download collections from the DPIE climate data portal by their collection ID. Provide collection IDs using thecollections
option.nci
to download folders of files from NCI. In this case,nci_host
is the name of either a remote host (eg.user@gadi.nci.org.au
) or the name of a host block that matches your SSH configuration (see below), andnci_folders
lets you generate folders on NCI to download from based on the parameters you're interested in.manual
lets you manually provide files (if, for example, you've already downloaded some). In this case,manual_folders
is a vector of paths to search. The files should still be named using the NARCliM DRS naming scheme:[var]_[grid]_[gcm]_[scenario]_[run]_[rcm]_v1_day_[startdate]-[enddate].nc
selected_thresholds
: a data frame of exceedance thresholds (in °C). They can be set separately for different temperature variables (tasmax-bc
andtasmin-bc
, for example). The pipeline will count the number of days annually at or above each threshold.year_breaks
: a vector of dates used to split the days up. Values here are the dates themselves (given as YYYY-MM-DD strings); names are the labels to give to each period. Periods run forward from the date provided to the day before the next break: for example, if1995
is"1986-01-01"
and-
is"2006-01-01"
, files between 1986 and 2005 are given the period name1995
. Use the name-
to drop files in this period.yearblock_stats
: a vector of statistics to calculate over the period (for example,"mean"
is the number of days ≥ X°C in an average year in the block, while"max"
is the largest number of such days).model_ensemble_stats
: a vector of statistics to calculate across the different climate models (GCMs, RCMs and runs).boundaries
:TODO - redescribe when switching to
{absmapsdata}`
If you choose to have the pipeline automatically download data from NCI, you may wish to configure SSH externally using ~/.ssh/config
. This will allow you to use an SSH key, so that you aren't required to provide a password for each folder you download.
The host block in your configuration should be for a remote host that has access to /g/data
, and the user should have access to the at43
project, where NARCliM results are stored.
Once you're set up and have changed any options, running the pipeline is as as simple as running ./run.sh
or ./run.R
, which will call targets::tar_make()
.
The pipeline will store intermediate results in the targets store folder, which by default is _targets/
in the project folder. (You can change this if you'd like.)
Final results are saved to the data
folder.
You can run the pipeline again - say, adding new collections or temperature thresholds to _targets.r
- and {targets}
will just run the results that need updating.
If you haven't used the {targets}
package before, there are two main learning resources:
The files you'll want to focus on are:
_targets.r
defines the pipeline: what steps will run, how they'll split and combine data up between steps, and what the essential inputs are (currently dataset collections and temperature thresholds).R/*.r
files contain the code for each step. They are sourced when the pipeline first starts (most notably, creating the folders for intermediate results), and the functions are referred to in the pipeline in_targets.r
.run.R
and.run.sh
are just shortcuts to run the pipeline, callingtargets::tar_make()
..Rprofile
sets other options: most notably the timeout for downloading files and, optionally, the locations of CDO and the targets data store.
The {targets}
package has some other useful functions for understanding the pipeline, which you can run if you have an R session open in the project folder:
# produce an interactive flowchart of the pipeline, its current status,
# and the number of branches at each step
targets::tar_visnetwork(targets_only = TRUE, label = "branches")
If you find any problems with the data or our analysis, please feel free to create an issue!