Update fig_path

rbnrsalas · Oct 20, 2022 · 95f720f · 95f720f
1 parent b1ed90e
commit 95f720f
Show file tree

Hide file tree

Showing 2 changed files with 89 additions and 89 deletions.
diff --git a/episodes/05-ggplot2.Rmd b/episodes/05-ggplot2.Rmd
@@ -25,7 +25,7 @@ source: Rmd
 
 ```{r, include=FALSE}
 source("../bin/chunk-options.R")
-knitr_fig_path("04-")
+knitr_fig_path("05-")
 source("../bin/download_data.R")
 ```
 
@@ -54,14 +54,14 @@ interviews_plotting <- interviews %>%
   ## if there were no items listed, changing NA to no_listed_items
   replace_na(list(items_owned = "no_listed_items")) %>%
   mutate(items_owned_logical = TRUE) %>%
-  pivot_wider(names_from = items_owned, 
-              values_from = items_owned_logical, 
+  pivot_wider(names_from = items_owned,
+              values_from = items_owned_logical,
               values_fill = list(items_owned_logical = FALSE)) %>%
   ## pivot wider by months_lack_food
   separate_rows(months_lack_food, sep = ";") %>%
   mutate(months_lack_food_logical = TRUE) %>%
-  pivot_wider(names_from = months_lack_food, 
-              values_from = months_lack_food_logical, 
+  pivot_wider(names_from = months_lack_food,
+              values_from = months_lack_food_logical,
               values_fill = list(months_lack_food_logical = FALSE)) %>%
   ## add some summary columns
   mutate(number_months_lack_food = rowSums(select(., Jan:May))) %>%
@@ -87,14 +87,14 @@ this fashion allows for extensive flexibility and customization of plots.
 
 Each chart built with ggplot2 must include the following
 
-* Data  
-* Aesthetic mapping (aes)  
+* Data
+* Aesthetic mapping (aes)
 
-  + Describes how variables are mapped onto graphical attributes  
-  + Visual attribute of data including x-y axes, color, fill, shape, and alpha  
-* Geometric objects (geom)  
+  + Describes how variables are mapped onto graphical attributes
+  + Visual attribute of data including x-y axes, color, fill, shape, and alpha
+* Geometric objects (geom)
 
-  + Determines how values are rendered graphically, as bars (`geom_bar`), scatterplot (`geom_point`), line (`geom_line`), etc. 
+  + Determines how values are rendered graphically, as bars (`geom_bar`), scatterplot (`geom_point`), line (`geom_line`), etc.
 
 Thus, the template for graphic in ggplot2 is:
 
@@ -138,7 +138,7 @@ interviews_plotting %>%
 The `+` in the **`ggplot2`** package is particularly useful because it allows
 you to modify existing `ggplot` objects. This means you can easily set up plot
 templates and conveniently explore different types of plots, so the above plot
-can also be generated with code like this, similar to the "intermediate steps" 
+can also be generated with code like this, similar to the "intermediate steps"
 approach in the previous lesson:
 
 ```{r first-ggplot-with-plus, fig.alt = "Scatter plot of number of items owned versus number of household members.", eval=FALSE, purl=FALSE}
@@ -187,29 +187,29 @@ interviews_plotting %>%
 ```
 
 Then, we start modifying this plot to extract more information from it. For
-instance, when inspecting the plot we notice that points only appear at the 
-intersection of whole numbers of `no_membrs` and `number_items`. Also, from a 
-rough estimate, it looks like there are far fewer dots on the plot than there 
+instance, when inspecting the plot we notice that points only appear at the
+intersection of whole numbers of `no_membrs` and `number_items`. Also, from a
+rough estimate, it looks like there are far fewer dots on the plot than there
 rows in our dataframe. This should lead us to believe that there may be multiple
-observations plotted on top of each other (e.g. three observations where 
-`no_membrs` is 3 and `number_items` is 1). 
+observations plotted on top of each other (e.g. three observations where
+`no_membrs` is 3 and `number_items` is 1).
 
-There are two main ways to alleviate overplotting issues: 
-1. changing the transparency of the points 
+There are two main ways to alleviate overplotting issues:
+1. changing the transparency of the points
 2. jittering the location of the points
 
-Let's first explore option 1, changing the transparency of the points. What we 
-mean when we say "transparency" we mean the opacity of point, or your ability to 
-see through the point. We can control the transparency of the points with the 
+Let's first explore option 1, changing the transparency of the points. What we
+mean when we say "transparency" we mean the opacity of point, or your ability to
+see through the point. We can control the transparency of the points with the
 `alpha` argument to `geom_point`. Values of `alpha` range from 0 to 1, with
-lower values corresponding to more transparent colors (an `alpha` of 1 is the 
-default value). Specifically, an alpha of 0.1, would make a point one-tenth as 
-opaque as a normal point. Stated differently ten points stacked on top of 
+lower values corresponding to more transparent colors (an `alpha` of 1 is the
+default value). Specifically, an alpha of 0.1, would make a point one-tenth as
+opaque as a normal point. Stated differently ten points stacked on top of
 each other would correspond to a normal point.
 
-Here, we change the `alpha` to 0.5, in an attempt to help fix the overplotting. 
+Here, we change the `alpha` to 0.5, in an attempt to help fix the overplotting.
 While the overplotting isn't solved, adding transparency begins to address this
-problem, as the points where there are overlapping observations are darker (as 
+problem, as the points where there are overlapping observations are darker (as
 opposed to lighter gray):
 
 ```{r adding-transparency, fig.alt = "Scatter plot of number of items owned versus number of household members, with transparency added to points.", purl=FALSE}
@@ -221,10 +221,10 @@ interviews_plotting %>%
 That only helped a little bit with the overplotting problem, so let's try option
 two. We can jitter the points on the plot, so that we can see each point in the
 locations where there are overlapping points. Jittering introduces a little bit
-of randomness into the position of our points. You can think of this process as 
-taking the overplotted graph and giving it a tiny shake. The points will move a 
-little bit side-to-side and up-and-down, but their position from the original 
-plot won't dramatically change. 
+of randomness into the position of our points. You can think of this process as
+taking the overplotted graph and giving it a tiny shake. The points will move a
+little bit side-to-side and up-and-down, but their position from the original
+plot won't dramatically change.
 
 We can jitter our points using the `geom_jitter()` function instead of the
 `geom_point()`  function, as seen below:
@@ -235,10 +235,10 @@ interviews_plotting %>%
     geom_jitter()
 ```
 The `geom_jitter()` function allows for us to specify the amount of random
-motion in the jitter, using the `width` and `height` arguments. When we don't 
+motion in the jitter, using the `width` and `height` arguments. When we don't
 specify values for `width` and `height`, `geom_jitter()` defaults to 40% of the
-resolution of the data (the smallest change that can be measured). Hence, if we 
-would like *less* spread in our jitter than was default, we should pick values 
+resolution of the data (the smallest change that can be measured). Hence, if we
+would like *less* spread in our jitter than was default, we should pick values
 between 0.1 and 0.4. Experiment with the values to see how your plot changes.
 
 ```{r adding-width-height, fig.alt = "Scatter plot of number of items owned versus number of household members, with jitter and transparency.", purl=FALSE}
@@ -249,7 +249,7 @@ interviews_plotting %>%
                 height = 0.2)
 ```
 
-For our final change, we can also add colours for all the points by specifying 
+For our final change, we can also add colours for all the points by specifying
 a `color` argument inside the `geom_jitter()` function:
 
 ```{r adding-colors, fig.alt = "Scatter plot of number of items owned versus number of household members, showing points as blue.", purl=FALSE}
@@ -261,17 +261,17 @@ interviews_plotting %>%
                 height = 0.2)
 ```
 
-To colour each village in the plot differently, you could use a vector as an input 
+To colour each village in the plot differently, you could use a vector as an input
 to the argument **`color`**.  However, because we are now mapping features of the
-data to a colour, instead of setting one colour for all points, the colour of the 
-points now needs to be set inside a call to the **`aes`** function. When we map 
+data to a colour, instead of setting one colour for all points, the colour of the
+points now needs to be set inside a call to the **`aes`** function. When we map
 a variable in our data to the colour of the points, **`ggplot2`** will provide a
-different colour corresponding to the different values of the variable. We will 
+different colour corresponding to the different values of the variable. We will
 continue to specify the value of **`alpha`**, **`width`**, and **`height`**
-outside of the **`aes`** function because we are using the same value for 
-every point. ggplot2 understands both the Commonwealth English and 
-American English spellings for colour, i.e., you can use either `color` 
-or `colour`. Here is an example where we color points by the **`village`** 
+outside of the **`aes`** function because we are using the same value for
+every point. ggplot2 understands both the Commonwealth English and
+American English spellings for colour, i.e., you can use either `color`
+or `colour`. Here is an example where we color points by the **`village`**
 of the observation:
 
 
@@ -282,31 +282,31 @@ interviews_plotting %>%
 ```
 
 There appears to be a positive trend between number of household
-members and number of items owned (from the list provided). Additionally, 
+members and number of items owned (from the list provided). Additionally,
 this trend does not appear to be different by village.
 
-> ## Notes 
-> 
+> ## Notes
+>
 > As you will learn, there are multiple ways to plot the a relationship
-> between variables. Another way to plot data with overlapping points is 
-> to use the `geom_count` plotting function. The `geom_count()`  function 
-> makes the size of each point representative of the number of data items 
-> of that type and the legend gives point sizes associated to particular 
-> numbers of items. 
+> between variables. Another way to plot data with overlapping points is
+> to use the `geom_count` plotting function. The `geom_count()`  function
+> makes the size of each point representative of the number of data items
+> of that type and the legend gives point sizes associated to particular
+> numbers of items.
 >
 > ```{r color-by-species-notes, fig.alt = "Previous plot with dots colored by village.", purl=FALSE}
-> interviews_plotting %>% 
+> interviews_plotting %>%
 >    ggplot(aes(x = no_membrs, y = number_items, color = village)) +
 >    geom_count()
-> ```    
+> ```
 
 {: .callout}
 
 > ## Exercise
 >
 > Use what you just learned to create a scatter plot of `rooms` by `village`
-> with the `respondent_wall_type` showing in different colours. Does this 
-> seem like a good way to display the relationship between these variables? 
+> with the `respondent_wall_type` showing in different colours. Does this
+> seem like a good way to display the relationship between these variables?
 > What other kinds of plots might you use to show this type of data?
 >
 > > ## Solution
@@ -501,20 +501,20 @@ like specifying the axes labels, and adding a title to the plot with
 relatively few lines of code. We will add more informative x-and y-axis
 labels to our plot, a more explanatory label to the legend, and a plot title.
 
-The `labs` function takes the following arguments: 
+The `labs` function takes the following arguments:
 
 - `title` -- to produce a plot title
-- `subtitle` -- to produce a plot subtitle (smaller text placed beneath the title) 
+- `subtitle` -- to produce a plot subtitle (smaller text placed beneath the title)
 - `caption` -- a caption for the plot
-- `...` -- any pair of name and value for aesthetics used in the plot (e.g., 
+- `...` -- any pair of name and value for aesthetics used in the plot (e.g.,
 `x`, `y`, `fill`, `color`, `size`)
 
 ```{r barplot-wall-types-labeled, fig.alt = "Previous plot with plot title and labells added."}
 percent_wall_type %>%
     ggplot(aes(x = village, y = percent, fill = respondent_wall_type)) +
     geom_bar(stat = "identity", position = "dodge") +
     labs(title = "Proportion of wall type by village",
-         fill = "Type of Wall in Home", 
+         fill = "Type of Wall in Home",
          x = "Village",
          y = "Percent")
 ```
@@ -527,10 +527,10 @@ data for a single village. This would be especially useful if we had
 a large number of villages that we had sampled, as a large number of
 side-by-side bars will become more difficult to read.
 
-**`ggplot2`** has a special technique called *faceting* that allows the 
-user to split one plot into multiple plots based on a factor included 
-in the dataset. We will use it to split our barplot of housing type 
-proportion by village so that each village has its own panel in a 
+**`ggplot2`** has a special technique called *faceting* that allows the
+user to split one plot into multiple plots based on a factor included
+in the dataset. We will use it to split our barplot of housing type
+proportion by village so that each village has its own panel in a
 multi-panel plot:
 
 ```{r barplot-faceting, fig.alt = "Bar plot showing percent of each wall type in each village."}
@@ -569,31 +569,31 @@ bar plots where each plot is a particular item. First we need to
 calculate the percentage of people in each village who own each item:
 
 ```{r percent-items-data}
-percent_items <- interviews_plotting %>% 
+percent_items <- interviews_plotting %>%
     group_by(village) %>%
-    summarize(across(bicycle:no_listed_items, ~ sum(.x) / n() * 100)) %>% 
+    summarize(across(bicycle:no_listed_items, ~ sum(.x) / n() * 100)) %>%
     pivot_longer(bicycle:no_listed_items, names_to = "items", values_to = "percent")
 ```
 
-To calculate this percentage data frame, we needed to use the `across()` 
-function within a `summarize()` operation. Unlike the previous example with a 
-single wall type variable, where each response was exactly one of the types 
-specified, people can (and do) own more than one item. So there are multiple 
-columns of data (one for each item), and the percentage calculation needs to be 
+To calculate this percentage data frame, we needed to use the `across()`
+function within a `summarize()` operation. Unlike the previous example with a
+single wall type variable, where each response was exactly one of the types
+specified, people can (and do) own more than one item. So there are multiple
+columns of data (one for each item), and the percentage calculation needs to be
 repeated for each column.
 
-Combining `summarize()` with `across()` allows us to specify first, the columns 
-to be summarized (`bicycle:no_listed_items`) and then the calculation. Because 
-our calculation is a bit more complex than is available in a built-in function, 
+Combining `summarize()` with `across()` allows us to specify first, the columns
+to be summarized (`bicycle:no_listed_items`) and then the calculation. Because
+our calculation is a bit more complex than is available in a built-in function,
 we define a new formula:
-* `~` indicates that we are defining a formula, 
-* `sum(.x)` gives the number of people owning that item by counting the number of `TRUE` 
-values (`.x` is shorthand for the column being operated on), 
+* `~` indicates that we are defining a formula,
+* `sum(.x)` gives the number of people owning that item by counting the number of `TRUE`
+values (`.x` is shorthand for the column being operated on),
 * and `n()` gives the current group size.
 
-After the `summarize()` operation, we have a table of percentages with each item 
-in its own column, so a `pivot_longer()` is required to transform the table into 
-an easier format for plotting. Using this data frame, we can now create a 
+After the `summarize()` operation, we have a table of percentages with each item
+in its own column, so a `pivot_longer()` is required to transform the table into
+an easier format for plotting. Using this data frame, we can now create a
 multi-paneled bar plot.
 
 ```{r percent-items-barplot, fig.alt = "Multi-panel bar chart showing percent  of respondents in each village and who owned each item, with no grids behid bars."}