Skip to content

Commit 2a34297

Browse files
committed
adjust classification example objects
predict the new year variable rather than island because it's _even harder_😅
1 parent 2165740 commit 2a34297

10 files changed

+39
-38
lines changed

R/example_data.R

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@
1414
#'
1515
#' `class_res_rf` and `class_res_nn`, contain multiclass classification tuning
1616
#' results for a random forest and neural network classification model,
17-
#' respectively, fitting \code{island} in the \code{palmerpenguins::penguins}
17+
#' respectively, fitting \code{year} (as a factor) in the
18+
#' \code{palmerpenguins::penguins}
1819
#' data using all of the other variables as predictors.
1920
#'
2021
#' `log_res_rf` and `log_res_nn`, contain binary classification tuning results

R/predict.R

+1-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
#'
5050
#' class_st
5151
#'
52-
#' # predict island, first as a class, then as
52+
#' # predict year, first as a class, then as
5353
#' # class probabilities
5454
#' predict(class_st, penguins_test)
5555
#' predict(class_st, penguins_test, type = "prob")

man-roxygen/example_models.Rmd

+3-2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ library(yardstick)
1010
data("penguins", package = "palmerpenguins")
1111
1212
penguins <- penguins[!is.na(penguins$sex),]
13+
penguins$year <- as.factor(penguins$year)
1314
1415
set.seed(1)
1516
@@ -99,8 +100,8 @@ reg_res_sp <-
99100
100101
# classification - preliminaries -----------------------------------
101102
penguins_class_rec <-
102-
recipes::recipe(island ~ ., data = penguins_train) %>%
103-
recipes::step_dummy(recipes::all_nominal(), -island) %>%
103+
recipes::recipe(year ~ ., data = penguins_train) %>%
104+
recipes::step_dummy(recipes::all_nominal(), -year) %>%
104105
recipes::step_zv(recipes::all_predictors()) %>%
105106
recipes::step_normalize(recipes::all_numeric())
106107

man-roxygen/note_example_data.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
#' * `reg_res_svm`: Tuning results for a support vector machine model
2424
#'
2525
#' In the multinomial classification setting, the relevant objects reflect
26-
#' models specified to fit `island` using all of the other variables as
27-
#' predictors. These objects include:
26+
#' models specified to fit `year` (as a factor) using all of the other variables
27+
#' as predictors. These objects include:
2828
#'
2929
#' * `class_res_nn`: Fitted resamples for a neural network model
3030
#' * `class_res_rf`: Tuning results for a random forest model

man/example_data.Rd

+5-3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/predict.model_stack.Rd

+3-3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/stack_add.Rd

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/stack_blend.Rd

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/stack_fit.Rd

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/classification.Rmd

+17-20
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,12 @@ knitr::opts_chunk$set(
1414
)
1515
```
1616

17-
In this article, we'll use the stacks package to predict the island that penguins come from using a stacked ensemble on the `palmerpenguins` data. This vignette assumes that you're familiar with tidymodels "proper," as well as the basic grammar of the package, and have seen it implemented on numeric data; if this is not the case, check out the "Getting Started With stacks" vignette!
18-
19-
The package is closely integrated with the rest of the functionality in tidymodels—we'll load those packages as well, in addition to a few tidyverse packages to evaluate our results later on.
17+
In this vignette, we'll tackle a multiclass classification problem using the stacks package. This vignette assumes that you're familiar with tidymodels "proper," as well as the basic grammar of the package, and have seen it implemented on numeric data; if this is not the case, check out the "Getting Started With stacks" vignette!
2018

2119
```{r setup, eval = FALSE}
2220
library(tidymodels)
21+
library(tidyverse)
2322
library(stacks)
24-
library(purrr)
25-
library(dplyr)
2623
```
2724

2825
```{r packages, include = FALSE}
@@ -35,31 +32,35 @@ library(yardstick)
3532
library(stacks)
3633
library(purrr)
3734
library(dplyr)
35+
library(tidyr)
3836
```
3937

40-
We'll make use of the `palmerpenguins::penguins` data, giving measurements taken from three different species of penguins from three different antarctic islands! We'll be predicting penguins species using the rest of the predictors in the data.
38+
Allison Horst's `palmerpenguins` package contains data giving measurements taken from three different species of penguins from three different islands in Antarctica. This study was carried out across three years—we might suspect that weather conditions may play a role in penguin migration and, to some extent, morphology (e.g. body mass). In this article, we'll use the stacks package to predict the year that these measurements were taken in using a stacked ensemble on the `palmerpenguins` data.
4139

4240
```{r, message = FALSE, warning = FALSE}
4341
library(palmerpenguins)
4442
data("penguins")
4543
4644
str(penguins)
4745
48-
penguins <- penguins[!is.na(penguins$sex),]
46+
penguins <-
47+
penguins %>%
48+
drop_na(sex) %>%
49+
mutate(year = as.factor(year))
4950
```
5051

51-
Let's plot the data to get a sense for how separable these three island groups are.
52+
Let's plot the data to get a sense for how separable these three years groups are.
5253

5354
```{r, message = FALSE, warning = FALSE}
5455
library(ggplot2)
5556
5657
ggplot(penguins) +
57-
aes(x = bill_length_mm, y = bill_depth_mm, color = island) +
58+
aes(x = bill_length_mm, y = bill_depth_mm, color = year) +
5859
geom_point() +
59-
labs(x = "Bill Length (mm)", y = "Bill Depth (mm)", col = "island")
60+
labs(x = "Bill Length (mm)", y = "Bill Depth (mm)", col = "Year")
6061
```
6162

62-
Just with these two predictors, it seems like we can already start to separate these islands decently well! Let's see how well the stacked ensemble can classify these penguins.
63+
Just with these two predictors, it seems like this might be a tough problem to solve! Let's see how well the stacked ensemble can classify these penguins.
6364

6465
# Defining candidate ensemble members
6566

@@ -76,26 +77,22 @@ penguins_test <- testing(penguins_split)
7677
folds <- rsample::vfold_cv(penguins_train, v = 5)
7778
7879
penguins_rec <-
79-
recipe(island ~ ., data = penguins_train) %>%
80-
step_dummy(all_nominal(), -island) %>%
80+
recipe(year ~ ., data = penguins_train) %>%
81+
step_dummy(all_nominal(), -year) %>%
8182
step_zv(all_predictors())
8283
8384
penguins_wflow <-
8485
workflow() %>%
8586
add_recipe(penguins_rec)
86-
87-
metric <- metric_set(roc_auc)
8887
```
8988

90-
Note that we now use the ROC AUC metric rather than root mean squared error (as in the numeric response setting)—any yardstick metric with classification functionality would work here.
91-
9289
We also need to use the same control settings as in the numeric response setting:
9390

9491
```{r}
9592
ctrl_grid <- control_stack_grid()
9693
```
9794

98-
We'll define two different model definitions to try to predict island—a random forest and a neural network.
95+
We'll define two different model definitions to try to predict year—a random forest and a neural network.
9996

10097
Starting out with a random forest:
10198

@@ -177,7 +174,7 @@ Computing the ROC AUC for the model:
177174
```{r, eval = FALSE}
178175
yardstick::roc_auc(
179176
penguins_pred,
180-
truth = island,
177+
truth = year,
181178
contains(".pred_")
182179
)
183180
```
@@ -187,7 +184,7 @@ Looks like our predictions were pretty strong! How do the stacks predictions per
187184
```{r}
188185
penguins_pred <-
189186
penguins_test %>%
190-
select(island) %>%
187+
select(year) %>%
191188
bind_cols(
192189
predict(
193190
penguins_model_st,

0 commit comments

Comments
 (0)