This repository was archived by the owner on Feb 27, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathintro-ggplot2.Rmd
372 lines (278 loc) · 10.3 KB
/
intro-ggplot2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
---
title: "Introduction to visualising and plotting your data using ggplot2"
params:
datetime: "2018-10-19"
level: "Beginner" # or intermediate or advanced
videolink: "" # Keep empty if no link yet.
instructor: "Luke W. Johnston"
---
```{r setup, echo=FALSE}
source("R/utils.R")
```
## Session details
- **Date of session**: `r format_date(params$datetime)`
- **Instructor**: `r params$instructor`
- **Session level**: `r params$level`
- **Part of the ["Beginner Series"](index.html)**
- Package to install:
```{r, eval=FALSE}
install.packages("ggplot2")
```
### Objectives
1. To become aware of the powerful features of ggplot2.
2. To learn about some of the fundamentals of easily creating amazing graphics.
3. To know about some resources to continue learning.
At the end of this session, you will achieve this objective by creating a
fairly simple, visually-appealing graph that shows:
- At least three data values, ie. "aesthetics" or `aes()`, such as what to put
on the x-axis, the y-axis, and or using `colour` or `size`.
- At least two layers, ie "geometries" or `geom_`, such as points, lines, or
boxplots.
- That has clearly labelled x and y axes, ie. `labs()`.
- That has some changes to the look and feel of the plot, ie. `theme()`, so that
it is publication ready.
## Resources for learning and help
<!-- -->
<!-- Because I can't teach everything... -->
**For learning**:
- [R for Data Science](http://r4ds.had.co.nz/)
- [Data visualisation](http://r4ds.had.co.nz/data-visualisation.html) and
[Graphics for communication](http://r4ds.had.co.nz/graphics-for-communication.html)
chapters
- [ggplot2 package documentation](https://ggplot2.tidyverse.org/)
- [ggplot2 cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf)
- [All RStudio cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
- [DataCamp (paid) online course on data visualistion](https://www.datacamp.com/courses/data-visualization-with-ggplot2-1)
**For help**:
- [StackOverflow for ggplot2](https://stackoverflow.com/questions/tagged/ggplot2?sort=frequent&pageSize=50)
- Using within RStudio help using `?` (such as `?geom_point` or `?theme`)
- [ggplot2 package documentation](https://ggplot2.tidyverse.org/)
<!-- Quickly go over RStudio IDE in relation to plotting -->
<!-- Get them to open two R scripts, name and save as "code-along" and "exercise" -->
## Quickly get familiar with data to plot
For this session we will be using the `CO2` dataset. Here is some code to get a
sense of the data.
```{r}
# Variables
names(CO2)
# General contents
str(CO2)
# Quick summary
summary(CO2)
```
### Exercise: Choose a dataset and check it out
There are several exercises in this session. Choose one of the below datasets
and use that dataset for all later exercises.
For complete R beginners, use:
- `mpg`
For more confident R users, use one of these:
- `economics`
- `diamonds`
- `msleep`
- `txhousing`
Check out the contents of the dataset you choose using:
```{r, eval=FALSE}
# variable names of dataset
names(___)
# contents of dataset
str(___)
# summary of dataset
summary(___)
```
## Basic structure of using ggplot2
ggplot2 uses the "Grammar of Graphics" (gg). This is a powerful approach to
creating plots because it provides a consistent way of telling ggplot2 what to
do. There are at least three aspects to using ggplot2 that relate to the grammar:
- Aesthetics, `aes()`: How data should be mapped to the plot. Includes what to
put on x axis, on the y axis, colours, size, etc.
- Geometries, `geom_`: The visual representation of the data, as a layer. This
tells ggplot2 how to show the aesthetics. Includes points, lines, boxes, etc.
- Themes, `theme_` or `theme()`: How the plot should look like. Includes the
text, axis lines, etc.
To maximise the power of ggplot2, make heavy use of autocompletion. You can do
this by typing, for instance, `geom_` and then hitting the TAB key to see a list
of all the geoms. Or after typing `theme(`, hit TAB to see all the options inside
theme.
## Visualise 1-dimensional (x axis) data
There are many ways of showing plotting continuous (e.g. weight, height)
variables in ggplot2. For discrete (e.g. terrain type: mountain, plains, or sex:
woman, man) variables, there is really only one way.
```{r}
library(ggplot2)
# Continuous
ggplot(CO2, aes(x = conc)) +
geom_density()
# Discrete
ggplot(CO2, aes(x = Treatment)) +
geom_bar()
```
### Exercise: One variable plots
Time: 10 min
```{r, eval=FALSE}
# put name of dataset below
names(___)
# use dataset with one continuous variable
ggplot(___, aes(x = ___)) +
# finish the geom to create either a histogram, freqpoly, or density layer
___
# use dataset with one discrete variable
ggplot(___, aes(x = ___)) +
# finish the geom to create a bar layer
___
```
## Visualise 2-dimensional (x and y axis) data
You can of course include data on the y axis too! This is usually what you use
graphs for! There are many more types of "geoms" to use for having data on both
axes, and which one you choose depends on what you are trying to show and what the
data is like. Usually you put the variable that you can influence (the
independent variable) on the x axis and the variable that responds (the
dependent variable) on the y axis.
```{r}
# Using continuous data
co2_plot_nums <- ggplot(CO2, aes(x = conc, y = uptake))
# Standard scatter plot
co2_plot_nums + geom_point()
# Connect all the data with a line
co2_plot_nums + geom_line()
# Put overlapping data into "hexes".. useful for massive datasets
co2_plot_nums + geom_hex()
# Connects data as they appear in the dataset
co2_plot_nums + geom_path()
# Runs a smoothing line with confidence interval
co2_plot_nums + geom_smooth()
# Using mixed data
co2_plot_mixed <- ggplot(CO2, aes(x = Type, y = uptake))
# Standard boxplot
co2_plot_mixed + geom_boxplot()
# Bar plot, showing total sum of uptake
co2_plot_mixed + geom_col()
# Better than boxplot, show the actual data!
co2_plot_mixed + geom_jitter()
# Give more distance between groups
co2_plot_mixed + geom_jitter(width = 0.2)
```
### Exercise: Two variable plots
Time: 8 min
```{r, eval=FALSE}
# use dataset with two continuous variables
ggplot(___, aes(x = ___, y = ___)) +
# finish the geom to create either a point, line, hex, smooth, or abline layer
___
# use dataset with one continuous and one discrete variable
ggplot(___, aes(x = ___, y = ___)) +
# finish the geom to create either a boxplot, jitter, or col layer
___
```
## Using a third (or more) variable
You can also add an additional dimension to the data by using other elements
(colours, size, transparency, etc) of the graph to represent another variable.
This is *NOT* the same thing as using 3-dimensionl (aka x, y, z axis) plots,
which should be avoided unless absolutely necessary! Using colours to represent
discrete groups is useful, or for using shading to represent a range in
continuous values.
```{r}
co2_plot_colour <- ggplot(CO2, aes(x = conc, y = uptake, colour = Treatment))
# Scatter plot
co2_plot_colour + geom_point()
# Line plot
co2_plot_colour + geom_line()
# Smoothing
co2_plot_colour + geom_smooth()
```
Or add a fourth variable.
```{r}
# Scatter plot
co2_plot_colour + geom_point(aes(shape = Type))
# Line plot
co2_plot_colour + geom_line(aes(linetype = Type))
# Another line plot
co2_plot_colour + geom_path(aes(linetype = Type))
# Smoothing plot
co2_plot_colour + geom_smooth(aes(linetype = Type))
```
And it's easy to add another geoms!
```{r}
# Three layers
co2_plot_colour +
geom_point(aes(shape = Type)) +
geom_line(aes(linetype = Plant)) +
geom_smooth(aes(size = Type))
```
### Exercise: Three variable plots
Time: 8 min
```{r, eval=FALSE}
# use dataset with either:
# - two continuous variables and one discrete
# - three continuous variables
# for last argument, choose either size, colour, alpha
ggplot(___, aes(x = ___, y = ___, ___ = ___)) +
# finish the geom to create either a point or line layer
___
```
## Axis titles and the theme
Let's get to making the plot prettier. There are many many options to customise
the plot using the `theme()`.
```{r}
co2_plot_prettying <-
ggplot(CO2, aes(
x = conc,
y = uptake,
colour = paste(Treatment, Type)
)
) +
geom_point() +
geom_smooth()
# Some pre-defined themes
co2_plot_prettying + theme_bw()
co2_plot_prettying + theme_minimal()
```
```{r, fig.width=7}
pretty_plot <- co2_plot_prettying +
theme_classic() +
scale_color_brewer(name = "Treatment and origin", palette = "Dark2") +
# Find this information in ?CO2
labs(x = "CO2 concentration (mL/L)",
y = "CO2 update rate (umol/m2)") +
theme(
# all axis lines, must use element_line
axis.line = element_line(colour = "grey50", size = 0.5),
# all axis text, must use element_text
axis.text = element_text(family = "sans"),
# all axis tick marks, use element_blank to remove
axis.ticks = element_blank()
)
pretty_plot
```
### Exercise: Change theme items of the plot
Time: 10 min
```{r, eval=FALSE}
# use dataset with two continuous variables
ggplot(___, aes(x = ___, y = ___)) +
# finish the geom to create either a point, smooth, or line layer
___ +
# choose either a minimal, dark, light, or classic defined theme
___ +
theme(
# choose colours such as red, blue, black, grey, yellow, green
# choose size from 2 to 8
panel.grid.major = element_line(colour = ___, size = ___),
# choose family such as sans, serif, Arial, Times New Romans
axis.text = element_text(colour = ___, size = ___, familyl = ___)
)
```
## Saving the plot
Now, if you want to save the plot, you can do that pretty easily!
```{r, eval=FALSE}
ggsave("plant_co2_uptake.pdf", pretty_plot, width = 7, height = 5)
```
## Exercise: Putting it all together
Time: Until end of session
1. Create a ggplot, choosing three variables for the `aes()`, one for:
- the `x`-axis
- the `y`-axis
- either `size`, `colour`, `alpha`, `stroke`, or `fill`
2. Create two `geom_` layers. The geom you use will depend on the variables and
the specific `aes()` you choose above.
3. Properly label the x and y axis with `labs()`.
4. Choose a pre-defined theme (`theme_`) and make two changes to it using `theme()`.
5. Save the plot with `ggsave()`.