-
Notifications
You must be signed in to change notification settings - Fork 13
Harmonization
In this worked example we will use the same local dataset as that used in section 2.1. Dataset definition and loading local grid data. This dataset comes from the NCEP/NCAR Reanalysis 1 encompassing the period 1961-2010 for the Iberian Peninsula domain and is available in a tar.gz file that can be downloaded and stored in a local directory as follows:
download.file("http://meteo.unican.es/work/loadeR/data/Iberia_NCEP.tar.gz",
destfile = "mydirectory/Iberia_NCEP.tar.gz")
# Extract files from the tar.gz file
untar("mydirectory/Iberia_NCEP.tar.gz", exdir = "mydirectory")
# First, the path to the ncml file is defined:
ncep.local <- "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml"
Before data loading, we do the inventory of the NcML file to identify the desired variable.
di <- dataInventory(ncep.local)
## [2016-02-17 20:02:39] Doing inventory ...
## [2016-02-17 20:02:39] Retrieving info for 'Z' (5 vars remaining)
## [2016-02-17 20:02:39] Retrieving info for 'T' (4 vars remaining)
## [2016-02-17 20:02:40] Retrieving info for 'Q' (3 vars remaining)
## [2016-02-17 20:02:40] Retrieving info for '2T' (2 vars remaining)
## [2016-02-17 20:02:40] Retrieving info for 'SLP' (1 vars remaining)
## [2016-02-17 20:02:40] Retrieving info for 'pr' (0 vars remaining)
## [2016-02-17 20:02:40] Done.
# e.g. temperature
str(di$`2T`)
## List of 4
## $ Description: chr "2m Temperature"
## $ DataType : chr "float"
## $ Units : chr "K"
## $ Dimensions :List of 4
## ..$ time :List of 4
## .. ..$ Type : chr "Time"
## .. ..$ TimeStep : chr "1.0 days"
## .. ..$ Units : chr "days since 1950-01-01 00:00:00"
## .. ..$ Date_range: chr "1961-01-01T00:00:00Z - 2010-12-31T00:00:00Z"
## ..$ level:List of 3
## .. ..$ Type : chr "Height"
## .. ..$ Units : chr "m"
## .. ..$ Values: num 2
## ..$ lat :List of 3
## .. ..$ Type : chr "Lat"
## .. ..$ Units : chr "degrees north"
## .. ..$ Values: num [1:6] 35 37.5 40 42.5 45 47.5
## ..$ lon :List of 3
## .. ..$ Type : chr "Lon"
## .. ..$ Units : chr "degrees east"
## .. ..$ Values: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5
The corresponding variable name of 2m air temperature is "2T" and the units are Kelvin. We can load this data with `loadGridData`as follows:
```r
tas <- loadGridData(ncep.local,
var = "2T",
lonLim = c(-12, 5),
latLim= c(35,45),
season= 6:8,
years = 1981:2000)
## [2019-05-22 09:37:23] Defining geo-location parameters
## [2019-05-22 09:37:23] Defining time selection parameters
## [2019-05-22 09:37:23] Retrieving data subset ...
## [2019-05-22 09:37:23] Done
The dictionary file is the tool used in loadeR
to harmonize the variables according to the climate4R
vocabulary:
C4R.vocabulary()
## identifier standard_name units
## 1 hurs 2-meter relative humidity %
## 2 hursmax maximum 2-meter relative humidity %
## 3 hursmin minimum 2-meter relative humidity %
## 4 hus specific humidity kg.kg-1
## 5 huss 2-meter specific humidity kg.kg-1
## 6 hussmax maximum 2-meter specific humidity kg.kg-1
## 7 hussmin minimum 2-meter specific humidity kg.kg-1
## 8 lm land binary mask 1
## 9 orog surface altitude m
## 10 ps air pressure at surface level Pa
## 11 psl air pressure at sea level Pa
## 12 rlds surface downwelling longwave radiation W.m-2
## 13 rlut toa outgoing longwave flux W.m-2
## 14 rlus surface upwelling longwave flux in air W.m-2
## 15 rsus surface upwelling shortwave flux in air W.m-2
## 16 rsds surface downwelling shortwave radiation W.m-2
## 17 sftlf land area fraction 1
## 18 ta air temperature degC
## 19 tas 2-meter air temperature degC
## 20 tasmax maximum 2-m air temperature degC
## 21 tasmin minimum 2-m air temperature degC
## 22 tdps 2-meter dewpoint temperature degC
## 23 ts surface_temperature degC
## 24 pr total precipitation amount mm
## 25 prr total rainfall amount mm
## 26 prsn total snowfall amount mm
## 27 ua eastward wind m.s-1
## 28 uas eastward near-surface wind m.s-1
## 29 va northward wind m.s-1
## 30 vas northward near-surface wind m.s-1
## 31 wss near-surface wind speed m.s-1
## 32 wssmax maximum near-surface wind speed m.s-1
## 33 wsg wind speed of gust m.s-1
## 34 wsgmax maximum wind speed of gust m.s-1
## 35 z geopotential m2.s-2
## 36 zg geopotential height m
## 37 zs surface geopotential m2.s-2
## 38 zgs surface geopotential height m
It matches the standard name given by the climate4R
vocabulary and the native name in the dataset. In this example, the dictionary file (Iberia_NCEP.dic) is included in the tar.gz. The dictionary file is typically created by the user at their convenience.
dictionary <- "mydirectory/Iberia_NCEP/Iberia_NCEP.dic"
read.table(dictionary, header = TRUE, sep = ",")
## identifier short_name time_step lower_time_bound upper_time_bound aggr_fun offset scale deaccum
## 1 hus Q 24h 0 24 mean 0.00 1.0000000 0
## 2 psl SLP 24h 0 24 mean 0.00 1.0000000 0
## 3 ta T 24h 0 24 mean -273.15 1.0000000 0
## 4 z Z 24h 0 24 mean 0.00 0.1020408 0
## 5 tas 2T 24h 0 24 mean -273.15 1.0000000 0
## 6 pr pr 24h 0 24 sum 0.00 1000.0000000 0
When loading data with function loadGridData
, the particular variables of each dataset are translated -and transformed if necessary- into the common vocabulary by means of a dictionary if the argument dictionary = TRUE
is specified. The function will perform all the necessary transformations to return the standard variables, as defined in the vocabulary. Thus, by means of the dictionary users do not need to care about specific variable names and variables into the different datasets, as long as the identifier is compliant with the climate4R
vocabulary.
Next, we illustrate a simple example of the use of the dictionary file.
The standard variable name for 2-meter air temperature is "tas".
tas2 <- loadGridData(ncep.local, var = "tas", dictionary = TRUE)
## [2019-05-22 09:57:34] Defining harmonization parameters for variable "tas"
## [2019-05-22 09:57:34] Defining geo-location parameters
## [2019-05-22 09:57:34] Defining time selection parameters
## [2019-05-22 09:57:34] Retrieving data subset ...
## [2019-05-22 09:57:35] Done
The NCEP dataset uses the variable name "2T" for 2-meter air temperature. As a result, if we use the standard name "tas" to load the data without a dictionary, the function will return an error:
tas3 <- try(loadGridData(ncep.local, var = "tas", dictionary = FALSE))
# Returns the error message:
## Error in loadGridData("mydirectory/Iberia_NCEP/Iberia_NCEP.ncml", :
## Variable requested not found
## Check 'dataInventory' output and/or dictionary 'identifier'.
Another useful feature of the dictionary is on-the-fly unit transformation. Since the standard units for "tas" are degrees Celsius, in this example, the dictionary also transforms Kelvin into degC. This is done by the offset parameter that is set in the dictionary file (in this case -273.15).
Note the differences in the attributes of objects tas
and tas2
regarding variable names and units:
## str(tas$Variable)
## List of 2
## $ varName: chr "2T"
## $ level : num 2
## - attr(*, "use_dictionary")= logi FALSE
## - attr(*, "description")= chr "2m Temperature"
## - attr(*, "units")= chr "K"
## - attr(*, "longname")= chr "2T"
## - attr(*, "daily_agg_cellfun")= chr "none"
## - attr(*, "monthly_agg_cellfun")= chr "none"
## - attr(*, "verification_time")= chr "none"
str(tas2$Variable)
## List of 2
## $ varName: chr "tas"
## $ level : num 2
## - attr(*, "use_dictionary")= logi TRUE
## - attr(*, "description")= chr "2m Temperature"
## - attr(*, "units")= chr "degC"
## - attr(*, "longname")= chr "2-meter air temperature"
## - attr(*, "daily_agg_cellfun")= chr "none"
## - attr(*, "monthly_agg_cellfun")= chr "none"
## - attr(*, "verification_time")= chr "none"
NOTE: more advanced features for unit handling and conversion after data loading are available through the climate4R package convertR
As shown in the example, there are other parameters which define the temporal characteristics of the data and other conversion operations needed to obtain the final data according to the user's needs. The following parameters need to be included in the .dic file:
- identifier: this is the name of the standard variable, as defined in the vocabulary
- short_name: this is the name with which the original variable has been coded in the dataset
- time_step: time scale of the data. For instance, 24h (for daily data), 3h ...
- lower_time_bound and upper_time_bound: temporal range of the data. These parameters indicate the lower and upper bound of the time interval for which the data are representative. For instance, instantaneous variables will have identical lower/upper bounds, while a value that is representative of a daily amount (e.g., total accumulated precipitation in 24 h, or mean daily temperature), will have the corresponding lower/upper bounds for which the value apply (e.g. from 00:00 of day 1 to 00:00 of day 2), being the value closed by the left and open by the right.
-
deaccum: in case of cumulative variables (e.g. precipitation) sometimes to obtain the data associated to a particular period it is needed to subtract two consecutive data (deaccumulate). This case are considered with this parameter when it is activated (deaccum=
1
). - cell_method: function of time aggregation between the lower and upper time bound. For instance, its value is "none" for instantaneous variables, "mean" for mean daily temperatures or "sum" for daily precipitation values. See the example below.
- offset: constant summed to the original variable for units conversion (e.g.: offset = -273.15 for conversion from Kelvin to Celsius)
- scale: scale factor applied to the original variable for units conversion (e.g.: scale = 0.001 for conversion from m to mm)
- deaccum: This is a logical flag (0 = FALSE, 1= TRUE), which indicates if the variable should be de-accumulated at each time step. Typically applied to precipitation in some forecast datasets.
- derived: this value is internally used by the loading functions to know if the variable is derived from any other variable(s) or can be directly read from the dataset.
- interface: this is a internal value used by the loading functions.
Note that all the fields above need to be included in the dictionary file. Their ordering is not important, as long as their names are preserved.
print(sessionInfo())
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## Random number generation:
## RNG: Mersenne-Twister
## Normal: Inversion
## Sample: Rounding
##
## locale:
## [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8
## [4] LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=es_ES.UTF-8 LC_ADDRESS=es_ES.UTF-8
## [10] LC_TELEPHONE=es_ES.UTF-8 LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=es_ES.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] visualizeR_1.3.2 transformeR_1.4.8 loadeR_1.4.12 loadeR.java_1.1.1 rJava_0.9-11
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 compiler_3.6.0 RColorBrewer_1.1-2 bitops_1.0-6
## [5] tools_3.6.0 boot_1.3-20 dotCall64_1.0-0 vioplot_0.3.0
## [9] lattice_0.20-38 Matrix_1.2-17 parallel_3.6.0 spam_2.2-2
## [13] akima_0.6-2 padr_0.4.2 raster_2.9-5 mapplots_1.5.1
## [17] fields_9.8-1 maps_3.3.0 grid_3.6.0 data.table_1.12.2
## [21] dtw_1.20-1 pbapply_1.4-0 tcltk_3.6.0 sm_2.2-5.6
## [25] SpecsVerification_0.5-2 sp_1.3-1 latticeExtra_0.6-28 magrittr_1.5
## [29] scales_1.0.0 codetools_0.2-16 CircStats_0.2-6 MASS_7.3-51.1
## [33] abind_1.4-5 colorspace_1.4-1 proxy_0.4-23 munsell_0.5.0
## [37] RCurl_1.95-4.12 verification_1.42 easyVerification_0.4.4 RcppEigen_0.3.3.5.0
## [41] zoo_1.8-5
- Package Installation (and known problems)
- Model Data (reanalysis and climate projections)
- Observations (station and gridded data)
- Standard data manipulation