-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbee-colony-losses.Rmd
172 lines (145 loc) · 6.95 KB
/
bee-colony-losses.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}"
description: |
Graphs and analysis using the #TidyTuesday data set for week 2 of 2022
(11/1/2022): "Bee Colony losses"
author:
- name: Ronan Harrington
url: https://github.com/rnnh/
date: 2022-01-23
repository_url: https://github.com/rnnh/TidyTuesday/
preview: bee-colony-losses_files/figure-html5/fig3-1.png
output:
distill::distill_article:
self_contained: false
toc: true
---
```{r knitr, include=FALSE}
knitr::opts_chunk$set(include = TRUE)
knitr::opts_chunk$set(fig.height = 6)
knitr::opts_chunk$set(fig.width = 9)
```
## Setup
Loading the `R` libraries and
[data set](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-01-11/readme.md).
```{r setup}
# Loading libraries
library(geomtextpath) # For adding text to ggplot2 curves
library(tidytuesdayR) # For loading data set
library(ggbeeswarm) # For creating a beeswarm plot
library(tidyverse) # For the ggplot2, dplyr libraries
library(gganimate) # For plot animation
library(ggthemes) # For more ggplot2 themes
library(viridis) # For plot themes
# Loading data set
tt <- tt_load("2022-01-11")
```
## Data wrangling
In this section, the Bee Colony data is wrangled into two tidy sets:
- `tidied_colony_counts_overall` contains quarterly colony counts for the USA
- `tidied_colony_counts_per_state` contains quarterly colony counts for various states within the USA
To create these sets, the original data is filtered to select for the appropriate states, and the "tidy_colony_data()" function is applied.
These sets are tidy as [each column is a variable, each row is an observation, and every cell has a single value](https://tidyr.tidyverse.org/articles/tidy-data.html#tidy-data).
The types of observations in these data sets are:
- `Total colonies`: Bee colonies counted
- `Lost`: Bee colonies lost
- `Added`: Bee colonies added
- `Renovated`: Bee colonies renovated
```{r wrangling}
# Creating subsets of the original bee colony data
colony_counts_overall <- tt$colony %>%
filter(state == "United States")
colony_counts_per_state <- tt$colony %>%
filter(state != "United States" & state != "Other states")
# Defining a function to tidy bee colony count data, which takes
# "messy_colony_data" as an argument
tidy_colony_data <- function(messy_colony_data){
# Writing the result of the following piped steps to "tidied_colony_data"
tidied_colony_data <- messy_colony_data %>%
# Selecting variables
select(year, colony_n, colony_lost, colony_added, colony_reno) %>%
# Dropping rows with missing values
drop_na() %>%
# Changing columns to rows
pivot_longer(!year, names_to = "type", values_to = "count") %>%
# Setting "type" as a factor variable
mutate(type = factor(type)) %>%
# Recoding the levels of the "type" factor
mutate(type = fct_recode(type,
"Total colonies" = "colony_n",
"Lost" = "colony_lost",
"Added" = "colony_added",
"Renovated" = "colony_reno")) %>%
# Reordering "type" factor levels
mutate(type = fct_relevel(type,
"Total colonies", "Lost", "Added", "Renovated"))
# Returning "tidied_colony_data"
return(tidied_colony_data)
}
# Using this function to tidy the subsets
tidied_colony_counts_overall <- tidy_colony_data(colony_counts_overall)
tidied_colony_counts_per_state <- tidy_colony_data(colony_counts_per_state)
# Printing a summary of the subsets before tidying...
colony_counts_overall
colony_counts_per_state
# ...and after tidying
tidied_colony_counts_overall
tidied_colony_counts_per_state
```
## Plotting Bee Colony observations using {ggbeeswarm}
The first graph plots a point for each type of observation using [geom_beeswarm()](https://github.com/eclarke/ggbeeswarm).
```{r fig1, fig.cap = "Scatter plots of bee colony observations. This plot has a point for each observation. Points are jittered to reduce overplotting."}
# Plotting Bee Colony observations using geom_beeswarm() from {ggbeeswarm}
tidied_colony_counts_per_state %>%
ggplot(aes(x = type, y = count)) +
geom_beeswarm(cex = 4, colour = "yellow") +
scale_y_log10() +
theme_solarized_2(light = FALSE) +
facet_wrap(~type, scales = "free") +
theme(legend.position="none", axis.text.x = element_blank()) +
labs(title = "Bee Colonies Counted, Lost, Added, Renovated",
subtitle = "Created using {ggbeeswarm}",
x = NULL, y = "Number of bee colonies (log10)",
fill = NULL)
```
## Animating Bee Colony observations over time
While the previous plot is thematically appropriate, it could be better.
This graph plots the same points over time in an animation, with the year plotted given in the subtitle.
This graph uses standard {ggplot2} [jittered points](https://ggplot2.tidyverse.org/reference/geom_jitter.html), as well as a box plot to illustrate the distribution of the points.
These box plots have notches, showing 95% confidence intervals for the median.
Distributions with notches that do not overlap differ significantly.
```{r fig2, fig.cap = "Animation showing bee colony counts from 2015 to 2021."}
# Defining an animation showing bee colony counts over time
p <- tidied_colony_counts_per_state %>%
ggplot(aes(x = count, y = fct_reorder(type, count))) +
geom_jitter(color = "yellow", alpha = 0.8) +
geom_boxplot(width = 0.2, alpha = 0.8, notch = TRUE, colour = "cyan") +
scale_x_log10() +
theme_solarized_2(light = FALSE) +
theme(legend.position="none", axis.ticks.y = element_blank(),
axis.line.y = element_blank()) +
transition_time(as.integer(year)) +
labs(title = "Bee Colonies Counted, Lost, Added, Renovated, per year",
subtitle = "Year: {frame_time}",
x = "Number of bee colonies (log10)", y = NULL)
# Rendering the animation as a .gif
animate(p, nframes = 180, start_pause = 20, end_pause = 20,
renderer = magick_renderer())
```
## Plotting the distribution of different Bee Colony observation types
From the previous plot, we can see that the `Added` and `Renovated` variables have similar distributions based on their box plots.
Distributions can also be visualised using density plots.
In this graph, the distribution of different types of observation in the data set are plotted.
```{r fig3, fig.cap = "A density plot, giving the distribution of various observations. Of the three types of observation plotted, Added and Renovated are the most similar."}
# Creating a density plot for different observation types
tidied_colony_counts_overall %>%
filter(type != "Total colonies") %>%
ggplot(aes(x = count, colour = type, label = type)) +
geom_textdensity(size = 7, fontface = 2, hjust = 0.89, vjust = 0.3,
linewidth = 1.2) +
theme_solarized_2(light = FALSE) +
theme(legend.position = "none") +
labs(title = "Distribution of Bee Colony Counts",
subtitle = "Distributions of Bee Colonies Addded, Renovated, Lost",
x = "Number of bee colonies")
```