-
Notifications
You must be signed in to change notification settings - Fork 13
Dataset definition and loading local grid data
In many occasions, climate data (reanalysis, GCM, gridded observations) are stored as collections of NetCDF files living in a directory with or without subdirectories. This may contain just one file per variable (as in this example), or more complex configurations such as several files per variable, partitioned by time periods, or one subdirectory per variable.
In the following, an example dataset will be illustrated. This dataset comes from the NCEP/NCAR Reanalysis 1 encompassing the period 1961-2010 for the Iberian Peninsula domain and is available in a tar.gz file that can be downloaded and stored in a local directory as follows:
download.file("http://meteo.unican.es/work/loadeR/data/Iberia_NCEP.tar.gz",
destfile = "mydirectory/Iberia_NCEP.tar.gz")
# Extract files from the tar.gz file
untar("mydirectory/Iberia_NCEP.tar.gz", exdir = "mydirectory")
dir <- "mydirectory/Iberia_NCEP"
This new directory contains 6 NetCDF files, each corresponding to a different variable:
list.files(dir, pattern = "\\.nc$")
## [1] "NCEP_2T.nc" "NCEP_pr.nc" "NCEP_Q.nc" "NCEP_SLP.nc" "NCEP_T.nc"
## [6] "NCEP_Z.nc"
Before opening and loading the data, it is advisable to get an overview of the variables stored, time and spatial extent, units etc. This is done by the dataInventory
function, which receives as argument the path to the file containing the data (this can be either a NetCDF or a NcML file) and returns a list defining the name, temporal and spatial ranges, and the main attributes of each variable. Each field of the list corresponds to a variable.
di <- dataInventory("mydirectory/Iberia_NCEP/NCEP_pr.nc")
## [2016-02-18 10:59:50] Doing inventory ...
## [2016-02-18 10:59:50] Retrieving info for 'pr' (0 vars remaining)
## [2016-02-18 10:59:51] Done.
str(di)
## List of 1
## $ pr:List of 4
## ..$ Description: chr "Total Precipitation (DM)"
## ..$ DataType : chr "float"
## ..$ Units : chr "m"
## ..$ Dimensions :List of 3
## .. ..$ time:List of 4
## .. .. ..$ Type : chr "Time"
## .. .. ..$ TimeStep : chr "1.0 days"
## .. .. ..$ Units : chr "days since 1950-01-01 00:00:00"
## .. .. ..$ Date_range: chr "1961-01-01T00:00:00Z - 2010-12-31T00:00:00Z"
## .. ..$ lat :List of 3
## .. .. ..$ Type : chr "Lat"
## .. .. ..$ Units : chr "degrees north"
## .. .. ..$ Values: num [1:6] 35 37.5 40 42.5 45 47.5
## .. ..$ lon :List of 3
## .. .. ..$ Type : chr "Lon"
## .. .. ..$ Units : chr "degrees east"
## .. .. ..$ Values: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5
In this case, the file contains only a variable (Total Precipitation) which name in the file is pr
that corresponds to the value used for the var
argument to load the data. In the next example we load the precipitation for the full period and domain:
pr <- loadGridData(dataset = "mydirectory/Iberia_NCEP/NCEP_pr.nc", var = "pr")
## [2016-02-18 10:59:51] Defining geo-location parameters
## [2016-02-18 10:59:51] Defining time selection parameters
## [2016-02-18 10:59:51] Retrieving data subset ...
## [2016-02-18 10:59:52] Done
The element Data
of the list is a N-dimensional array, where N is the number of dimensions depending on the type of request (3 in this case) with the dimension names stored as an attribute named "dimensions". The dimensions are always arranged in their canonical order, this is: time > lat > lon. The function returns also all the necessary information for the identification of the variable and its temporal and spatial definition:
str(pr)
## List of 4
## $ Variable:List of 2
## ..$ varName: chr "pr"
## ..$ level : NULL
## ..- attr(*, "use_dictionary")= logi FALSE
## ..- attr(*, "description")= chr "Total Precipitation (DM)"
## ..- attr(*, "units")= chr "m"
## ..- attr(*, "longname")= chr "pr"
## ..- attr(*, "daily_agg_cellfun")= chr "none"
## ..- attr(*, "monthly_agg_cellfun")= chr "none"
## ..- attr(*, "verification_time")= chr "none"
## $ Data : num [1:18262, 1:6, 1:9] 2.28e-04 0.00 0.00 0.00 3.08e-05 ...
## ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
## $ xyCoords:List of 2
## ..$ x: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5
## ..$ y: num [1:6] 35 37.5 40 42.5 45 47.5
## ..- attr(*, "projection")= chr "LatLonProjection"
## $ Dates :List of 2
## ..$ start: chr [1:18262] "1961-01-01 00:00:00 GMT" "1961-01-02 00:00:00 GMT" "1961-01-03 00:00:00 GMT" "1961-01-04 00:00:00 GMT" ...
## ..$ end : chr [1:18262] "1961-01-01 00:00:00 GMT" "1961-01-02 00:00:00 GMT" "1961-01-03 00:00:00 GMT" "1961-01-04 00:00:00 GMT" ...
## - attr(*, "dataset")= chr "mydirectory/Iberia_NCEP/NCEP_pr.nc"
We can apply an aggregation function to obtain daily (aggr.d
) or monthly (aggr.m
) data (a "NOTE" is returned with the aggregation details) from the original data:
monthlyPr <- loadGridData(dataset = "mydirectory/Iberia_NCEP/NCEP_pr.nc",
var = "pr",
aggr.m = "sum")
## [2016-02-18 10:59:52] Defining geo-location parameters
## [2016-02-18 10:59:52] Defining time selection parameters
## NOTE: Daily data will be monthly aggregated
## [2016-02-18 10:59:53] Retrieving data subset ...
## [2016-02-18 10:59:54] Done
Some drawing functionalities have been included in loadeR
. For example, a first analysis of the data could be to explore the mean field for a defined period and region. This typical plotting operation has been included as a function called plotMeanField
in loadeR
and can be directly obtained using:
plotMeanField(pr)
Imagine that we want to load several variables, or that the NetCDF files in the directory are of the same variable but for different time periods and we want to load a large period. In those cases, we would need to repeat the procedure above several times pointing to different files and, in addition, a posterior aggregation of the data would be also necessary. This could be avoided creating virtual datasets, which aggregates files and/or variables, by means of a NcML file. We can create the NcML file with function makeAggregatedDataset
, that can handle automatically all these situations.
In order to aggregate the NetCDF files to form a unique dataset (NcML), we can just type:
makeAggregatedDataset(source.dir = dir, ncml.file = "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml", verbose = TRUE)
## [2015-10-22 13:08:07] Creating dataset from 6 files
## [2015-10-22 13:08:07] Scanning file 1 out of 6
## [2015-10-22 13:08:07] Scanning file 2 out of 6
## [2015-10-22 13:08:07] Scanning file 3 out of 6
## [2015-10-22 13:08:07] Scanning file 4 out of 6
## [2015-10-22 13:08:07] Scanning file 5 out of 6
## [2015-10-22 13:08:07] Scanning file 6 out of 6
## [2015-10-22 13:08:07] NcML file "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml" created from 6 files corresponding to 6 variables
## Use 'dataInventory' to obtain a description of the dataset
This is how this NcML looks:
system("cat mydirectory/Iberia_NCEP/Iberia_NCEP.ncml")
## <?xml version="1.0" encoding="UTF-8"?>
## <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
## <aggregation type="union">
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_2T.nc" ncoords="18262"/>
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_pr.nc" ncoords="18262"/>
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_Q.nc" ncoords="18262"/>
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_SLP.nc" ncoords="18262"/>
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_T.nc" ncoords="18262"/>
## <netcdf location="mydirectory/Iberia_NCEP/NCEP_Z.nc" ncoords="18262"/>
## </aggregation>
## </netcdf>
The new dataset has been created in the destination given by the ncml.file
argument. Now, it is possible to work with this file as in the previous case (make an inventory, load a variable, etc) when we consider a unique NetCDF file.
We can obtain the inventory of all the data in our local directoy if we specify the path to the NcML file in function dataInventory
.
# First, the path to the ncml file is defined:
ncep.local <- "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml"
di <- dataInventory(ncep.local)
## [2016-02-18 10:59:55] Doing inventory ...
## [2016-02-18 10:59:55] Retrieving info for 'Z' (5 vars remaining)
## [2016-02-18 10:59:55] Retrieving info for 'T' (4 vars remaining)
## [2016-02-18 10:59:55] Retrieving info for 'Q' (3 vars remaining)
## [2016-02-18 10:59:55] Retrieving info for '2T' (2 vars remaining)
## [2016-02-18 10:59:55] Retrieving info for 'SLP' (1 vars remaining)
## [2016-02-18 10:59:55] Retrieving info for 'pr' (0 vars remaining)
## [2016-02-18 10:59:55] Done.
The inventory returns a list named of length n, being n the number of variables stored in the dataset, being the names the label in the file of each variable:
names(di)
## [1] "Z" "T" "Q" "2T" "SLP" "pr"
For each variable in the dataset (e.g., T), the following information is given:
str(di$T)
## List of 4
## $ Description: chr "Temperature"
## $ DataType : chr "float"
## $ Units : chr "K"
## $ Dimensions :List of 4
## ..$ time :List of 4
## .. ..$ Type : chr "Time"
## .. ..$ TimeStep : chr "1.0 days"
## .. ..$ Units : chr "days since 1950-01-01 00:00:00"
## .. ..$ Date_range: chr "1961-01-01T00:00:00Z - 2010-12-31T00:00:00Z"
## ..$ level:List of 3
## .. ..$ Type : chr "Pressure"
## .. ..$ Units : chr "millibar"
## .. ..$ Values: num [1:4] 500 700 850 1000
## ..$ lat :List of 3
## .. ..$ Type : chr "Lat"
## .. ..$ Units : chr "degrees north"
## .. ..$ Values: num [1:6] 35 37.5 40 42.5 45 47.5
## ..$ lon :List of 3
## .. ..$ Type : chr "Lon"
## .. ..$ Units : chr "degrees east"
## .. ..$ Values: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5
So T is air temperature, and has four vertical levels (500, 700, 850 and 1000 hPa).
Within loadeR
, a grid is defined as a single variable within a defined spatio-temporal domain. In the following example we will load air temperature (T
) at 1000 hPa level (var = "T@1000"
) for a selected rectangular domain centered on the Iberian Peninsula (lonLim = c(-12, 5)
and LatLim= c(35,45)
), considering summer (season= 6:8
) for the period 1981-2000 (years = 1981:2000
). Note that the "@" symbol is used in order to specify a particular vertical level, when levels exist. A field can only contain one vertical level. Several levels would be handled using a multifield, as we shall explain in the next section.
T1000 <- loadGridData(dataset = ncep.local,
var = "T@1000",
lonLim = c(-12, 5),
latLim= c(35,45),
season= 6:8,
years = 1981:2000)
## [2016-02-18 10:59:56] Defining geo-location parameters
## [2016-02-18 10:59:56] Defining time selection parameters
## [2016-02-18 10:59:56] Retrieving data subset ...
## [2016-02-18 10:59:56] Done
Note that, in this case, the vertical dimension is dropped as it is of length one (1000 hPa), so the returned array is 3-dimensional:
str(T1000)
## List of 4
## $ Variable:List of 2
## ..$ varName: chr "T"
## ..$ level : num 1000
## ..- attr(*, "use_dictionary")= logi FALSE
## ..- attr(*, "description")= chr "Temperature"
## ..- attr(*, "units")= chr "K"
## ..- attr(*, "longname")= chr "T"
## ..- attr(*, "daily_agg_cellfun")= chr "none"
## ..- attr(*, "monthly_agg_cellfun")= chr "none"
## ..- attr(*, "verification_time")= chr "none"
## $ Data : num [1:1840, 1:5, 1:8] 289 288 289 289 289 ...
## ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
## $ xyCoords:List of 2
## ..$ x: num [1:8] -12.5 -10 -7.5 -5 -2.5 0 2.5 5
## ..$ y: num [1:5] 35 37.5 40 42.5 45
## ..- attr(*, "projection")= chr "LatLonProjection"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00 GMT" "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" ...
## ..$ end : chr [1:1840] "1981-06-01 00:00:00 GMT" "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" ...
## - attr(*, "dataset")= chr "mydirectory/Iberia_NCEP/Iberia_NCEP.ncml"
print(sessionInfo())
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=es_ES.UTF-8
## [9] LC_ADDRESS=es_ES.UTF-8 LC_TELEPHONE=es_ES.UTF-8
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=es_ES.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] loadeR_0.2-0 loadeR.java_1.1-0 rJava_0.9-8
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-31 digest_0.6.8 bitops_1.0-6 grid_3.2.3
## [5] formatR_1.2 magrittr_1.5 spam_1.3-0 evaluate_0.7
## [9] stringi_0.4-1 sp_1.1-0 akima_0.5-12 rmarkdown_0.6.1
## [13] tools_3.2.3 stringr_1.0.0 RCurl_1.95-4.7 maps_3.0.2
## [17] fields_8.3-6 yaml_2.1.13 abind_1.4-3 htmltools_0.2.6
## [21] knitr_1.10.5
- Package Installation (and known problems)
- Model Data (reanalysis and climate projections)
- Observations (station and gridded data)
- Standard data manipulation