Skip to content

Commit

Permalink
Incorporate count() suggestion from #2, add code line highlighting
Browse files Browse the repository at this point in the history
  • Loading branch information
spcanelon committed Sep 23, 2020
1 parent e926df3 commit 7a17500
Show file tree
Hide file tree
Showing 53 changed files with 895 additions and 1,768 deletions.
12 changes: 6 additions & 6 deletions 01-readr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,22 +71,22 @@ background-size: 80px

.panelset[
.panel[.panel-name[Read data in]
.center[Both options will get you the same dataset!]
.pull-left[
.center[
### Both options below will get you the same dataset!]

Option 1
```{r}
# option 1: load using URL ----
raw_adelie_url <- read_csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.219.3&entityid=002f3893385f710df69eeebe893144ff")
```
]
.pull-right[

Option 2
```{r}
# option 2: load using filepath ----
raw_adelie_filepath <- read_csv("tutorial/raw_adelie.csv")
```
]

]

.panel[.panel-name[Save data]

Lucky for us, the `palmerpenguins` `r emo::ji("package")` compiles data from all three species together for us!
Expand Down
26 changes: 16 additions & 10 deletions 03-ggplot2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,16 @@ Let's see if body mass varies by penguin sex

.pull-left[
```{r eval=FALSE}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) + #<<
geom_point()
```
]

.pull-right[
```{r, echo=FALSE, warning=FALSE, fig.height=5}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) +
geom_point()
```
]
Expand All @@ -98,14 +100,16 @@ ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +

.pull-left[
```{r eval=FALSE}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot()
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) +
geom_boxplot() #<<
```
]

.pull-right[
```{r echo=FALSE, warning=FALSE, fig.height=5}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) +
geom_boxplot()
```
]
Expand All @@ -115,17 +119,19 @@ ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +

.pull-left[
```{r eval=FALSE}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species))
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species)) #<<
```

### <br/> What do you notice?
]

.pull-right[
```{r echo=FALSE, warning=FALSE, fig.height=5}
ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species)) #<<
ggplot(data = penguins,
aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species))
```
]
]
Expand All @@ -149,7 +155,7 @@ Next stop, `dplyr`!
```{r echo=FALSE, warning=FALSE, fig.height=5}
penguins %>%
ggplot(aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species)) #<<
geom_boxplot(aes(fill = species))
```
]
]
Expand Down
118 changes: 98 additions & 20 deletions 04-dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,13 @@ background-size: 80px
- ...and more!
]
.pull-right[
#### Pick observations by their values with `filter()`.
#### Reorder the rows with `arrange()`.
#### Pick variables by their names `select()`.
#### Create new variables with functions of existing variables `mutate()`.
#### Collapse many values down to a single summary `summarize()`.
#### and `group_by()` gets the above functions to operate group-by-group rather than on the entire dataset
- Pick observations by their values with `filter()`.
- Reorder the rows with `arrange()`.
- Pick variables by their names `select()`.
- Create new variables with functions of existing variables with `mutate()`.
- Collapse many values down to a single summary with `summarize()`.
- `group_by()` gets the above functions to operate group-by-group rather than on the entire dataset.
- and `count()` + `add_count()` simplify `group_by()` + `summarize()` when you just want to count
]
]

Expand Down Expand Up @@ -73,6 +74,7 @@ background-size: 80px
# dplyr: exercise

.panelset[

.panel[.panel-name[Select]
.center[
### Can you spot the difference in performing the same operation?
Expand All @@ -82,6 +84,7 @@ background-size: 80px
select(penguins, species, sex, body_mass_g)
```
]

.pull-right[
```{r}
penguins %>%
Expand All @@ -91,6 +94,9 @@ penguins %>%
]

.panel[.panel-name[Arrange]

We can use `arrange()` to arrange our data in descending order by **body_mass_g**

.pull-left[
```{r}
glimpse(penguins)
Expand All @@ -100,45 +106,117 @@ glimpse(penguins)
```{r}
penguins %>%
select(species, sex, body_mass_g) %>%
arrange(desc(body_mass_g))
arrange(desc(body_mass_g)) #<<
```
]
]

.panel[.panel-name[Group By & Summarize]
#### Summarizing the data using `group_by()` and `summarize()`

.pull-left[
We can summarize the data using `group_by()` and `summarize()` to obtain counts by **species** and **sex**
```{r}
penguins %>%
group_by(species, sex) %>%
summarize(count = n())
group_by(species, sex) %>% #<<
summarize(n = n()) #<<
```
]

.panel[.panel-name[Mutate]
#### Creating new variables with `mutate()`
.pull-right[
And because we're just _counting_, we also have the option to use `count()` which simplifies our code!

.small-text[Example kindly [contributed by Alison Hill (@apreshill)](https://github.com/spcanelon/2020-rladies-chi-tidyverse/issues/2)]

```{r}
penguins %>%
count(species, sex) #<<
```
]
]

.panel[.panel-name[Mutate: Ex. 1]

.pull-left[
We can use `mutate()` to create a new variable **n_species** that adds up all observations per **species**
```{r}
penguins %>%
group_by(species) %>%
mutate(count_species = n()) %>%
mutate(n_species = n()) %>% #<<
ungroup() %>%
group_by(species, sex, count_species) %>%
group_by(species, sex, n_species) %>%
summarize(n = n())
```
]

.pull-right[
**OR** we can use `count()`'s friend `add_count()` to create **n_species**, again because we're just _counting_

.small-text[Example kindly [contributed by Alison Hill (@apreshill)](https://github.com/spcanelon/2020-rladies-chi-tidyverse/issues/2)]
```{r}
penguins %>%
count(species, sex) %>%
add_count(species, wt = n, #<<
name = "n_species") #<<
```
]
]

.panel[.panel-name[Mutate: Ex. 2]

With either approach, we can use `mutate()` to create a new variable **prop**, which represents the proportion of penguins of each **sex**, grouped by **species**

.pull-left[
```{r}
penguins %>%
group_by(species) %>%
mutate(n_species = n()) %>%
ungroup() %>%
group_by(species, sex, n_species) %>%
summarize(count = n()) %>%
mutate(prop = count/count_species*100)
mutate(prop = count/n_species*100) #<<
```

]
.pull-right[
.small-text[Example kindly [contributed by Alison Hill (@apreshill)](https://github.com/spcanelon/2020-rladies-chi-tidyverse/issues/2)]

```{r}
penguins %>%
count(species, sex) %>%
add_count(species, wt = n,
name = "n_species") %>%
mutate(prop = n/n_species*100) #<<
```
]
]

.panel[.panel-name[Filter]
#### Filtering rows using `filter()`

Finally, we can filter rows to only show us **Chinstrap** penguin summaries by adding `filter()` to our pipeline

.pull-left[
```{r}
penguins %>%
group_by(species) %>%
mutate(count_species = n()) %>%
mutate(n_species = n()) %>%
ungroup() %>%
group_by(species, sex, count_species) %>%
group_by(species, sex, n_species) %>%
summarize(count = n()) %>%
mutate(percentage = count/count_species*100) %>%
filter(species == "Chinstrap")
mutate(prop = count/n_species*100) %>%
filter(species == "Chinstrap") #<<
```

]
.pull-right[
```{r}
penguins %>%
count(species, sex) %>%
add_count(species, wt = n,
name = "n_species") %>%
mutate(prop = n/n_species*100) %>%
filter(species == "Chinstrap") #<<
```
]
]

]
9 changes: 7 additions & 2 deletions 05-forcats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,9 @@ background-size: 80px
### The `factor()` function is perfect for this.
```{r eval=FALSE}
penguins %>%
mutate(year_factor = factor(year, levels = unique(year)))
mutate(year_factor =
factor(year, #<<
levels = unique(year))) #<<
```
]

Expand All @@ -84,14 +86,17 @@ penguins %>%
```{r}
penguins_new <-
penguins %>%
mutate(year_factor = factor(year, levels = unique(year)))
mutate(year_factor =
factor(year, #<<
levels = unique(year))) #<<
penguins_new
```
]

.pull-right[
```{r}
class(penguins_new$year_factor)
levels(penguins_new$year_factor)
```
]
Expand Down
4 changes: 2 additions & 2 deletions 06-stringr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ background-size: 80px
```{r}
penguins %>%
select(species, island) %>%
mutate(ISLAND = str_to_upper(island))
mutate(ISLAND = str_to_upper(island)) #<<
```
]

Expand All @@ -90,7 +90,7 @@ penguins %>%
penguins %>%
select(species, island) %>%
mutate(ISLAND = str_to_upper(island)) %>%
mutate(species_island = str_c(species, ISLAND, sep = "_"))
mutate(species_island = str_c(species, ISLAND, sep = "_")) #<<
```
]
]
6 changes: 3 additions & 3 deletions 07-tidyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ untidy_penguins

```{r}
untidy_penguins %>%
pivot_longer(cols = male:`NA`,
names_to = "sex",
values_to = "body_mass_g")
pivot_longer(cols = male:`NA`, #<<
names_to = "sex", #<<
values_to = "body_mass_g") #<<
```
]
]
Loading

0 comments on commit 7a17500

Please sign in to comment.