-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-data-download-and-plot.Rmd
137 lines (97 loc) · 3.21 KB
/
01-data-download-and-plot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
title: "Preparing R code for reproducible research"
subtitle: "Example code for plotting data"
author:
- "José R. Ferrer-Paris"
date: 'March 2023'
---
This is an example Rmarkdown file with lots of comments and five steps, please see the presentation in the presentation folder of this repo and read the README file.
With Rmarkdown, the comments, code and the output are knitted together.
# Outline
- Preliminaries
- Step 1: Input - Read the data
- Step 2: Modify data
- Step 3: Save modified data
- Step 4: Read the RDS file
- Step 5: Plot the data
- R session info and done !
# Preliminaries
This script read data prepared with the `01-data-download.R` script and outputs an image in pdf format.
# Step 1: Data download
We want to download the FRED-MD dataset:
> FRED-MD ("Federal Reserve Economic Data Monthly Database") is a publicly available dataset. Originally designed for the analysis of ***big economic data***, the dataset contains more than one hundred features.
```{r}
fred.url <- 'https://files.stlouisfed.org/files/htdocs/fred-md/monthly/current.csv'
```
We will work in a temporal directory:
```{r}
my.temp.dir <- tempdir()
```
<div class="alert alert-warning">
<strong>Warning!</strong> I am using `tempdir()` here, but this might not always be a good idea !
</div>
We use `sprintf` to add the path of the temporal directory and the name of the output file: `raw-data.csv`,
```{r}
target.file <- sprintf("%s/raw-data.csv", my.temp.dir)
```
if file does not exists we download it:
```{r}
if (!file.exists(target.file)) {
# function download.file uses the best available method for each OS, test if this works for you:
download.file(url=fred.url, destfile=target.file)
}
```
Now read the file:
```{r}
original.data <- read.csv(target.file)
```
# Step 2: Modify data
We want to skip the first record and use the next 100 records from the table, if library `dplyr` is available we can use the `slice` function, otherwise we can use simply `head`
```{r}
if (require(dplyr)) {
ss.data <- original.data %>%
slice(2:101)
} else {
ss.data <- head(original.data[-1,],n=100)
}
```
We can check the dimensions of the data with `dim`
```{r}
# Before:
dim(original.data)
# After:
dim(ss.data)
```
# Step 3: Save modified data
If we want to reuse the data we save it here:
```{r}
target.file <- sprintf("%s/modified-data.rds", my.temp.dir)
saveRDS(file=target.file,ss.data)
```
# Step 4: Read the data from the RDS file
And then we load it with `readRDS`:
```{r}
fred.data.subset <- readRDS(file=target.file)
```
# Step 5: Plot the data
We want to use ggplot2 to make cool graphics, the code will give an error if ggplot2 is not installed
```{r}
if (!require(ggplot2)) {
stop("Sorry, this code only works with the ggplot2 library")
}
```
This is our code to make a very fancy plot of three important economic variables (?)
```{r}
ggplot(fred.data.subset) +
geom_point(aes(x=INDPRO, y=S.P.500, colour=RPI)) +
labs(title='Three very important economic variables',
subtitle='Data from the FRED-MD dataset') +
xlab("Industrial production") +
ylab("Stock Market") +
scale_colour_continuous("Personal\nIncome",type = "viridis") +
theme_linedraw()
```
# R session Info
```{r}
sessionInfo()
```