Skip to content

Commit

Permalink
Update fig_path
Browse files Browse the repository at this point in the history
  • Loading branch information
jessesadler committed Oct 20, 2022
1 parent b1ed90e commit 95f720f
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 89 deletions.
158 changes: 79 additions & 79 deletions episodes/05-ggplot2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ source: Rmd

```{r, include=FALSE}
source("../bin/chunk-options.R")
knitr_fig_path("04-")
knitr_fig_path("05-")
source("../bin/download_data.R")
```

Expand Down Expand Up @@ -54,14 +54,14 @@ interviews_plotting <- interviews %>%
## if there were no items listed, changing NA to no_listed_items
replace_na(list(items_owned = "no_listed_items")) %>%
mutate(items_owned_logical = TRUE) %>%
pivot_wider(names_from = items_owned,
values_from = items_owned_logical,
pivot_wider(names_from = items_owned,
values_from = items_owned_logical,
values_fill = list(items_owned_logical = FALSE)) %>%
## pivot wider by months_lack_food
separate_rows(months_lack_food, sep = ";") %>%
mutate(months_lack_food_logical = TRUE) %>%
pivot_wider(names_from = months_lack_food,
values_from = months_lack_food_logical,
pivot_wider(names_from = months_lack_food,
values_from = months_lack_food_logical,
values_fill = list(months_lack_food_logical = FALSE)) %>%
## add some summary columns
mutate(number_months_lack_food = rowSums(select(., Jan:May))) %>%
Expand All @@ -87,14 +87,14 @@ this fashion allows for extensive flexibility and customization of plots.

Each chart built with ggplot2 must include the following

* Data
* Aesthetic mapping (aes)
* Data
* Aesthetic mapping (aes)

+ Describes how variables are mapped onto graphical attributes
+ Visual attribute of data including x-y axes, color, fill, shape, and alpha
* Geometric objects (geom)
+ Describes how variables are mapped onto graphical attributes
+ Visual attribute of data including x-y axes, color, fill, shape, and alpha
* Geometric objects (geom)

+ Determines how values are rendered graphically, as bars (`geom_bar`), scatterplot (`geom_point`), line (`geom_line`), etc.
+ Determines how values are rendered graphically, as bars (`geom_bar`), scatterplot (`geom_point`), line (`geom_line`), etc.

Thus, the template for graphic in ggplot2 is:

Expand Down Expand Up @@ -138,7 +138,7 @@ interviews_plotting %>%
The `+` in the **`ggplot2`** package is particularly useful because it allows
you to modify existing `ggplot` objects. This means you can easily set up plot
templates and conveniently explore different types of plots, so the above plot
can also be generated with code like this, similar to the "intermediate steps"
can also be generated with code like this, similar to the "intermediate steps"
approach in the previous lesson:

```{r first-ggplot-with-plus, fig.alt = "Scatter plot of number of items owned versus number of household members.", eval=FALSE, purl=FALSE}
Expand Down Expand Up @@ -187,29 +187,29 @@ interviews_plotting %>%
```

Then, we start modifying this plot to extract more information from it. For
instance, when inspecting the plot we notice that points only appear at the
intersection of whole numbers of `no_membrs` and `number_items`. Also, from a
rough estimate, it looks like there are far fewer dots on the plot than there
instance, when inspecting the plot we notice that points only appear at the
intersection of whole numbers of `no_membrs` and `number_items`. Also, from a
rough estimate, it looks like there are far fewer dots on the plot than there
rows in our dataframe. This should lead us to believe that there may be multiple
observations plotted on top of each other (e.g. three observations where
`no_membrs` is 3 and `number_items` is 1).
observations plotted on top of each other (e.g. three observations where
`no_membrs` is 3 and `number_items` is 1).

There are two main ways to alleviate overplotting issues:
1. changing the transparency of the points
There are two main ways to alleviate overplotting issues:
1. changing the transparency of the points
2. jittering the location of the points

Let's first explore option 1, changing the transparency of the points. What we
mean when we say "transparency" we mean the opacity of point, or your ability to
see through the point. We can control the transparency of the points with the
Let's first explore option 1, changing the transparency of the points. What we
mean when we say "transparency" we mean the opacity of point, or your ability to
see through the point. We can control the transparency of the points with the
`alpha` argument to `geom_point`. Values of `alpha` range from 0 to 1, with
lower values corresponding to more transparent colors (an `alpha` of 1 is the
default value). Specifically, an alpha of 0.1, would make a point one-tenth as
opaque as a normal point. Stated differently ten points stacked on top of
lower values corresponding to more transparent colors (an `alpha` of 1 is the
default value). Specifically, an alpha of 0.1, would make a point one-tenth as
opaque as a normal point. Stated differently ten points stacked on top of
each other would correspond to a normal point.

Here, we change the `alpha` to 0.5, in an attempt to help fix the overplotting.
Here, we change the `alpha` to 0.5, in an attempt to help fix the overplotting.
While the overplotting isn't solved, adding transparency begins to address this
problem, as the points where there are overlapping observations are darker (as
problem, as the points where there are overlapping observations are darker (as
opposed to lighter gray):

```{r adding-transparency, fig.alt = "Scatter plot of number of items owned versus number of household members, with transparency added to points.", purl=FALSE}
Expand All @@ -221,10 +221,10 @@ interviews_plotting %>%
That only helped a little bit with the overplotting problem, so let's try option
two. We can jitter the points on the plot, so that we can see each point in the
locations where there are overlapping points. Jittering introduces a little bit
of randomness into the position of our points. You can think of this process as
taking the overplotted graph and giving it a tiny shake. The points will move a
little bit side-to-side and up-and-down, but their position from the original
plot won't dramatically change.
of randomness into the position of our points. You can think of this process as
taking the overplotted graph and giving it a tiny shake. The points will move a
little bit side-to-side and up-and-down, but their position from the original
plot won't dramatically change.

We can jitter our points using the `geom_jitter()` function instead of the
`geom_point()` function, as seen below:
Expand All @@ -235,10 +235,10 @@ interviews_plotting %>%
geom_jitter()
```
The `geom_jitter()` function allows for us to specify the amount of random
motion in the jitter, using the `width` and `height` arguments. When we don't
motion in the jitter, using the `width` and `height` arguments. When we don't
specify values for `width` and `height`, `geom_jitter()` defaults to 40% of the
resolution of the data (the smallest change that can be measured). Hence, if we
would like *less* spread in our jitter than was default, we should pick values
resolution of the data (the smallest change that can be measured). Hence, if we
would like *less* spread in our jitter than was default, we should pick values
between 0.1 and 0.4. Experiment with the values to see how your plot changes.

```{r adding-width-height, fig.alt = "Scatter plot of number of items owned versus number of household members, with jitter and transparency.", purl=FALSE}
Expand All @@ -249,7 +249,7 @@ interviews_plotting %>%
height = 0.2)
```

For our final change, we can also add colours for all the points by specifying
For our final change, we can also add colours for all the points by specifying
a `color` argument inside the `geom_jitter()` function:

```{r adding-colors, fig.alt = "Scatter plot of number of items owned versus number of household members, showing points as blue.", purl=FALSE}
Expand All @@ -261,17 +261,17 @@ interviews_plotting %>%
height = 0.2)
```

To colour each village in the plot differently, you could use a vector as an input
To colour each village in the plot differently, you could use a vector as an input
to the argument **`color`**. However, because we are now mapping features of the
data to a colour, instead of setting one colour for all points, the colour of the
points now needs to be set inside a call to the **`aes`** function. When we map
data to a colour, instead of setting one colour for all points, the colour of the
points now needs to be set inside a call to the **`aes`** function. When we map
a variable in our data to the colour of the points, **`ggplot2`** will provide a
different colour corresponding to the different values of the variable. We will
different colour corresponding to the different values of the variable. We will
continue to specify the value of **`alpha`**, **`width`**, and **`height`**
outside of the **`aes`** function because we are using the same value for
every point. ggplot2 understands both the Commonwealth English and
American English spellings for colour, i.e., you can use either `color`
or `colour`. Here is an example where we color points by the **`village`**
outside of the **`aes`** function because we are using the same value for
every point. ggplot2 understands both the Commonwealth English and
American English spellings for colour, i.e., you can use either `color`
or `colour`. Here is an example where we color points by the **`village`**
of the observation:


Expand All @@ -282,31 +282,31 @@ interviews_plotting %>%
```

There appears to be a positive trend between number of household
members and number of items owned (from the list provided). Additionally,
members and number of items owned (from the list provided). Additionally,
this trend does not appear to be different by village.

> ## Notes
>
> ## Notes
>
> As you will learn, there are multiple ways to plot the a relationship
> between variables. Another way to plot data with overlapping points is
> to use the `geom_count` plotting function. The `geom_count()` function
> makes the size of each point representative of the number of data items
> of that type and the legend gives point sizes associated to particular
> numbers of items.
> between variables. Another way to plot data with overlapping points is
> to use the `geom_count` plotting function. The `geom_count()` function
> makes the size of each point representative of the number of data items
> of that type and the legend gives point sizes associated to particular
> numbers of items.
>
> ```{r color-by-species-notes, fig.alt = "Previous plot with dots colored by village.", purl=FALSE}
> interviews_plotting %>%
> interviews_plotting %>%
> ggplot(aes(x = no_membrs, y = number_items, color = village)) +
> geom_count()
> ```
> ```
{: .callout}
> ## Exercise
>
> Use what you just learned to create a scatter plot of `rooms` by `village`
> with the `respondent_wall_type` showing in different colours. Does this
> seem like a good way to display the relationship between these variables?
> with the `respondent_wall_type` showing in different colours. Does this
> seem like a good way to display the relationship between these variables?
> What other kinds of plots might you use to show this type of data?
>
> > ## Solution
Expand Down Expand Up @@ -501,20 +501,20 @@ like specifying the axes labels, and adding a title to the plot with
relatively few lines of code. We will add more informative x-and y-axis
labels to our plot, a more explanatory label to the legend, and a plot title.
The `labs` function takes the following arguments:
The `labs` function takes the following arguments:
- `title` -- to produce a plot title
- `subtitle` -- to produce a plot subtitle (smaller text placed beneath the title)
- `subtitle` -- to produce a plot subtitle (smaller text placed beneath the title)
- `caption` -- a caption for the plot
- `...` -- any pair of name and value for aesthetics used in the plot (e.g.,
- `...` -- any pair of name and value for aesthetics used in the plot (e.g.,
`x`, `y`, `fill`, `color`, `size`)
```{r barplot-wall-types-labeled, fig.alt = "Previous plot with plot title and labells added."}
percent_wall_type %>%
ggplot(aes(x = village, y = percent, fill = respondent_wall_type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Proportion of wall type by village",
fill = "Type of Wall in Home",
fill = "Type of Wall in Home",
x = "Village",
y = "Percent")
```
Expand All @@ -527,10 +527,10 @@ data for a single village. This would be especially useful if we had
a large number of villages that we had sampled, as a large number of
side-by-side bars will become more difficult to read.

**`ggplot2`** has a special technique called *faceting* that allows the
user to split one plot into multiple plots based on a factor included
in the dataset. We will use it to split our barplot of housing type
proportion by village so that each village has its own panel in a
**`ggplot2`** has a special technique called *faceting* that allows the
user to split one plot into multiple plots based on a factor included
in the dataset. We will use it to split our barplot of housing type
proportion by village so that each village has its own panel in a
multi-panel plot:

```{r barplot-faceting, fig.alt = "Bar plot showing percent of each wall type in each village."}
Expand Down Expand Up @@ -569,31 +569,31 @@ bar plots where each plot is a particular item. First we need to
calculate the percentage of people in each village who own each item:

```{r percent-items-data}
percent_items <- interviews_plotting %>%
percent_items <- interviews_plotting %>%
group_by(village) %>%
summarize(across(bicycle:no_listed_items, ~ sum(.x) / n() * 100)) %>%
summarize(across(bicycle:no_listed_items, ~ sum(.x) / n() * 100)) %>%
pivot_longer(bicycle:no_listed_items, names_to = "items", values_to = "percent")
```

To calculate this percentage data frame, we needed to use the `across()`
function within a `summarize()` operation. Unlike the previous example with a
single wall type variable, where each response was exactly one of the types
specified, people can (and do) own more than one item. So there are multiple
columns of data (one for each item), and the percentage calculation needs to be
To calculate this percentage data frame, we needed to use the `across()`
function within a `summarize()` operation. Unlike the previous example with a
single wall type variable, where each response was exactly one of the types
specified, people can (and do) own more than one item. So there are multiple
columns of data (one for each item), and the percentage calculation needs to be
repeated for each column.

Combining `summarize()` with `across()` allows us to specify first, the columns
to be summarized (`bicycle:no_listed_items`) and then the calculation. Because
our calculation is a bit more complex than is available in a built-in function,
Combining `summarize()` with `across()` allows us to specify first, the columns
to be summarized (`bicycle:no_listed_items`) and then the calculation. Because
our calculation is a bit more complex than is available in a built-in function,
we define a new formula:
* `~` indicates that we are defining a formula,
* `sum(.x)` gives the number of people owning that item by counting the number of `TRUE`
values (`.x` is shorthand for the column being operated on),
* `~` indicates that we are defining a formula,
* `sum(.x)` gives the number of people owning that item by counting the number of `TRUE`
values (`.x` is shorthand for the column being operated on),
* and `n()` gives the current group size.

After the `summarize()` operation, we have a table of percentages with each item
in its own column, so a `pivot_longer()` is required to transform the table into
an easier format for plotting. Using this data frame, we can now create a
After the `summarize()` operation, we have a table of percentages with each item
in its own column, so a `pivot_longer()` is required to transform the table into
an easier format for plotting. Using this data frame, we can now create a
multi-paneled bar plot.

```{r percent-items-barplot, fig.alt = "Multi-panel bar chart showing percent of respondents in each village and who owned each item, with no grids behid bars."}
Expand Down
Loading

0 comments on commit 95f720f

Please sign in to comment.