-
Notifications
You must be signed in to change notification settings - Fork 4
/
R08_Eurostat.Rmd
223 lines (161 loc) · 6.9 KB
/
R08_Eurostat.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "Eurostat database and package"
output: html_document
---
### `{eurostat}`
`{eurostat}` R package provides tools to access open data from [Eurostat](http://ec.europa.eu/eurostat):
* data search,
* download,
* manipulation,
* visualization.
### Installation
```{r install, eval=FALSE, echo=T}
# Note the "eval=FALSE" argument: {r} chunk is not run.
# .. to install the package each time Markdown file is compiled (not recommended),
# .. just remove the "eval=FALSE" argument.
install.packages("eurostat")
```
---
## Using the package
* [Cheat sheet: eurostat R package](http://ropengov.github.io/eurostat/articles/cheatsheet.html)
* [Tutorial (vignette) for the eurostat R package](http://ropengov.github.io/eurostat/articles/eurostat_tutorial.html)
* [Detailed documentation for eurostat functions](http://ropengov.github.io/eurostat/reference/index.html)
---
| Command | Description |
|----------------------|------------------------------------------------------|
| `get_eurostat_toc()` | Download table of contents of Eurostat datasets |
| `search_eurostat()` | Retrieve (grep) datasets titles from Eurostat |
| `get_eurostat()` | Read Eurostat data |
| `label_eurostat()` | Get Eurostat code descriptions |
| `get_eurostat_geospatial()` | Download geospatial data from GISGO |
---
## Search Eurostat for data
```{r, echo=T, message=F}
options(readr.default_locale=readr::locale(tz="Europe/Berlin"))
require(eurostat)
require(knitr)
require(tidyr)
require(dplyr)
require(ggplot2)
require(RColorBrewer)
```
```{r, eval=FALSE}
# To actually run this {r} chunk, change the eval argument
toc <- get_eurostat_toc() # Downloads Table of Contents of Eurostat Data Sets
class(toc)
dim(toc)
str(toc,list.len = 10) # only few items listed
```
With `search_eurostat()`, you can search the table of contents for particular text (text patterns).
* *regex*: R regular expression syntax is used: see `?regex` for details.
* `.*` is particularly useful basic "tool" in text pattern search:
+ The period `.` matches any single character.
+ `*`: The preceding item (`.`) will be matched zero or more times.
* **regex is case sensitive** -- see next example, where we search Eurostat for unemployment data:
```{r}
# kable() generates tabular (formatted) output in Rmd files
kable(search_eurostat(".*unemployment.*rates.*NUTS", fixed=F))
kable(search_eurostat(".*Unemployment.*rates.*NUTS", fixed=F))
```
```{r, eval=FALSE, echo=T}
# Alternatively, you can use grep() to search a downloaded TOC
# .. this way, you can ignore the case-sensitive "issue"
toc <- get_eurostat_toc() # Downloads Table of Contents of Eurostat Data Sets
toc[grep(".*unemployment.*rates.*NUTS", toc$title, ignore.case = T),]$title
# ... this R code is not executed, provided for your information only
# ... you can switch the `eval` argument to produce the output table
```
---
## Download data
As an example, let's choose the **Unemployment rates by sex, age and NUTS 2 regions (%)** dataset `lfst_r_lfu3rt`
* All datasets are available through a web browser (see the last string in the web address)
* http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=lfst_r_lfu3rt
Download the data:
```{r, message=F}
# http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=lfst_r_lfu3rt
Un.DF <- get_eurostat("lfst_r_lfu3rt", time_format = "num") # note the simplified time format
dim(Un.DF)
# (5 age groups) * (3 gender categories) * (19 year 1999-2017) * (504 geo units) = 143.640
#
#
#
str(Un.DF)
# note the "value" variable - it contains Unemployment rates for a given "row"
#
#
#
summary(Un.DF) # note that observations are annual
head(Un.DF,10)
```
By default, variable identification is provided through Eurostat **codes**.
* To get human-readable labels instead, we can use `label_eurostat()` function
* Good for orientation in the dataset, NOT for `gather()` , `spread()` data handling
```{r, echo=T, message=F}
Un.DF.l <- label_eurostat(Un.DF, fix_duplicated = T)
```
```{r}
head(Un.DF.l,6)
```
Also, codes and their "descriptions" can be shown side-by-side:
```{r}
head(kable(cbind(as.character(unique(Un.DF$geo)),as.character(unique(Un.DF.l$geo)))),17)
# note the NUTS-code format:
# NUTS0 (states) have 2-digit IDs ... "AT"
# NUTS1 regions have 3-digit IDs
# NUTS2 regions have 4-digit IDS
#
head(kable(cbind(as.character(unique(Un.DF$age)),as.character(unique(Un.DF.l$age)))),5)
```
---
## Data handling
We can simply save the data for subsequent use:
```{r}
write.csv(Un.DF, "datasets/Unemployment.csv", row.names = F)
```
----
We can use `{tidyverse}` and `{ggplot2}` functions to filter and plot data.
**Example 1: Unemployment plot for selected countries (time series)**
* `Y15-74` i.e. age group *from 15 to 74 years*
* Select data 2010 and newer
* Total unemployment only (no M/F/T) structure
* Select only: Austria, Czech Republic, Germany, Hungary, Poland, Slovakia
* NUTS 0 (State-level)
```{r}
Un.DF %>%
filter(age == "Y15-74", time >= 2010, sex == "T") %>% # filter variables
filter(geo %in% c("AT","CZ","DE","HU","PL","SK")) %>% # subset of countries
ggplot(aes(x = time, y = values, colour = geo))+ # plot filtered data
geom_line()+ # choose plot type
ggtitle("Unemployment rates")+ # Define main title
ylab("Unemployment (%)")+ # define label on the y-axis
theme_minimal() # choose plot "design"
```
---
**Example 2: Chorpleth (infomap) of unemployment**
* `Y15-74` i.e. age group *from 15 to 74 years*
* Select data for 2017
* Total unemployment only (no M/F/T)
* NUTS 2
* Austria, Czech Republic, Germany, Hungary, Poland, Slovakia
* Draw choropleth (cartogram, infomap)
```{r, echo=T, message=F}
# Download geospatial data from GISCO # NUTS revisions: e.g.: 2010, 2013, 2016.
geodata <- get_eurostat_geospatial(resolution = "60", nuts_level = "2", year = 2016)
# Filter Unemployment dataset
Un.2016 <- Un.DF %>%
filter(age == "Y15-74", time == 2017, sex == "T") %>% # filter variables, year, sex
filter(nchar(as.character(geo)) == 4) %>% # NUTS2 regions have a 4-digit id
mutate(NUTS0 = substr(as.character(geo), start=1, stop=2)) %>% # retrieve NUTS0 id from NUTS2
filter(NUTS0 %in% c("AT","CZ","DE","HU","PL","SK"))
# Join Unemployment data with "map data"
map_data <- inner_join(geodata, Un.2016)
# plot the data
ggplot()+
geom_sf(data = map_data, aes(fill = values))+
# note that "values" is name of column that stores unemployment data...
scale_fill_gradientn('Unem. \n(%) ', colours=brewer.pal(8, "Oranges"))+
ggtitle("Unemployment rates")
```
---
**Quick assignment:** Add Netherlands (`NL`) to both plots (Examples 1 & 2).
---