Skip to content

Observation Data format

gutierjm edited this page Jun 18, 2014 · 8 revisions

Weather station data are most often stored in the form of text/csv files. In the following, we describe the standard format for observational datasets, which will be stored as a collection of csv files strictly following this structure:

stations.txt

This file contains the information regarding the weather stations. The first three columns are the minimum information required for defining an station dataset, so these are compulsory. The remaining data (altitude, location, WMO_Id and Koppen.class) are an example of optional metadata than can be additionally included in the dataset. The datasets can have as many metadata as one may want, but the first three columns station_id, longitude and latitude are compulsory, and their names must match exactly the ones shown in this example.

station_id,longitude,latitude,altitude,location,WMO_Id,Koppen.class
SP000008027,-2.0392,43.3075,251,SAN SEBASTIAN - IGUELDO,8027,Cfb
SP000008181,2.0697,41.2928,4,BARCELONA/AEROPUERTO,8181,Csa
SP000008202,-5.4981,40.9592,790,SALAMANCA AEROPUERTO,8202,BSk
SP000008215,-4.0103,40.7806,1894,NAVACERRADA,8215,Csb
SP000008280,-1.8631,38.9519,704,ALBACETE LOS LLANOS,8280,BSk
SP000008410,-4.8458,37.8442,90,CORDOBA AEROPUERTO,8410,Csa

variables.txt

This file contains the information regarding the variables contained in the dataset, including their identification (variable), description (longname), units of measure (unit), the code used to identify missing data (missing_code) and other info that can be optionally included.

variable, longname, unit, missing_code, type, source, url
precip, total precip accumulated in 24 hours, 0.1 mm, NaN, observation, Global Station Network, ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/gsn/
tmin, minimum daily temperature, 0.1 degC, NaN, observation, Global Station Network, ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/gsn/
tmax, maximum daily temperature, 0.1 degC, NaN, observation, Global Station Network, ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/gsn/

Data files

Variables are stored separately in text files named as indicated by the variable field in the variables.txt file. The first column of the file represents the observation date dates, following the format YYYYMMDD. More exceptionally in downscaling applications, time records for subdaily data can be indicated using the format YYYYMMDDHH. The remaining columns (2 to n) correspond to the observed series at each station, following the order of the stations.txt file. This is a (truncated) example file for the minimum daily temperature data of this dataset:

"YYYYMMDD","SP000008027","SP000008181","SP000008202","SP000008215","SP000008280","SP000008410"
19790225,NaN,0.6,NaN,NaN,NaN,0.6
19790226,NaN,5,NaN,NaN,NaN,2
19790227,NaN,2.2,NaN,NaN,NaN,-1
19790228,NaN,2,NaN,NaN,NaN,1
19790301,2.8,5.8,-2,-8.6,-1,2.8
19790302,4,4.8,-3,-7.4,-4,-2
19790303,6.6,3.6,-1.8,-4,-5.4,0
19790304,6.6,6.4,0.3,0.6,1,2
19790305,6,7.8,6.2,0.8,7,4
19790306,6,6.8,6.2,0.8,6.2,10
19790307,5.6,5.4,4.8,-0.6,6,12.4
19790308,4,7.5,4.5,0,5,10.4
19790309,6,6.8,1,-1,3.6,9.4
19790310,9,6.8,1.8,-1,0.6,5
19790311,9,5.6,3,1.6,3.4,5
19790312,9,7.8,1,4.6,2.6,6.6
19790313,8.6,8,2.6,3.8,3.4,7.4
19790314,4.4,7.2,0.6,-5.8,4.6,10
19790315,2.6,5.8,-0.8,-7.2,-0.4,5
19790316,2.4,3,0.2,-7.2,-1,0.6
19790317,5.6,6.6,0.9,-3.2,3,8
19790318,5,6.2,0,-5.6,2.6,6
19790319,5.2,7.4,0.4,-5,3.4,8
19790320,5.6,6.2,1,-6.2,1.6,5
19790321,5.6,5,0.6,-6.2,2.4,6.4
[... continues]

Note that a reference observational dataset ("GSN Iberia") is included in this repository, corresponding to a subset of the GSN station dataset for the Iberian Peninsula.

Clone this wiki locally