Skip to content

add residuals when outcome is available in augment.workflow() #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Config/Needs/website:
tidyverse/tidytemplate,
yardstick
Remotes:
tidymodels/parsnip
tidymodels/parsnip#961
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
* The prediction columns are now appended to the LHS rather than RHS of
`new_data` in `augment.workflow()`, following analogous changes in parsnip (#200).

* While `augment.workflow()` previously never returned a `.resid` column, the
method will now return residuals under the same conditions that
`augment.model_fit()` does.

# workflows 1.1.3

* The workflows methods for `generics::tune_args()` and `generics::tunable()`
Expand Down
32 changes: 23 additions & 9 deletions R/broom.R
Original file line number Diff line number Diff line change
Expand Up @@ -139,26 +139,40 @@ glance.workflow <- function(x, ...) {
#' }
augment.workflow <- function(x, new_data, eval_time = NULL, ...) {
fit <- extract_fit_parsnip(x)
mold <- extract_mold(x)

# supply outcomes to `augment.model_fit()` if possible (#131)
outcomes <- FALSE
if (length(fit$preproc$y_var) > 0) {
outcomes <- all(fit$preproc$y_var %in% names(new_data))
}

# `augment.model_fit()` requires the pre-processed `new_data`
predictors <- forge_predictors(new_data, x)
predictors <- prepare_augment_predictors(predictors)
predictors_and_predictions <- augment(fit, predictors, eval_time = eval_time, ...)
forged <- hardhat::forge(new_data, blueprint = mold$blueprint, outcomes = outcomes)

if (outcomes) {
new_data_forged <- vctrs::vec_cbind(forged$predictors, forged$outcomes)
} else {
new_data_forged <- forged$predictors
}

new_data_forged <- prepare_augment_new_data(new_data_forged)
out <- augment(fit, new_data_forged, eval_time = eval_time, ...)

prediction_columns <- setdiff(
names(predictors_and_predictions),
names(predictors)
augment_columns <- setdiff(
names(out),
names(new_data_forged)
)

predictions <- predictors_and_predictions[prediction_columns]
out <- out[augment_columns]

# Return original `new_data` with new prediction columns
out <- vctrs::vec_cbind(predictions, new_data)
out <- vctrs::vec_cbind(out, new_data)

out
}

prepare_augment_predictors <- function(x) {
prepare_augment_new_data <- function(x) {
# `augment()` works best with a data frame of predictors,
# so we need to undo any matrix/sparse matrix compositions that
# were returned from `hardhat::forge()` (#148)
Expand Down
28 changes: 27 additions & 1 deletion tests/testthat/test-broom.R
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,33 @@ test_that("can augment using a fitted workflow's model", {
# at least 1 prediction specific column should be added
expect_true(ncol(x) > ncol(df))

expect_named(x, c(".pred", "y", "x"))
expect_named(x, c(".pred", ".resid", "y", "x"))
})

test_that("can augment without outcome column", {
skip_if_not_installed("broom")

df <- data.frame(y = c(2, 3, 4), x = c(1, 5, 3))
df_new <- df["x"]

lm_spec <- parsnip::linear_reg()
lm_spec <- parsnip::set_engine(lm_spec, "lm")

wf <- workflow()
wf <- add_formula(wf, y ~ x)
wf <- add_model(wf, lm_spec)

wf <- fit(wf, df)

x <- augment(wf, df_new)

expect_s3_class(x, "tbl_df")
expect_identical(nrow(x), 3L)

# at least 1 prediction specific column should be added
expect_true(ncol(x) > ncol(df_new))

expect_named(x, c(".pred", "x"))
})

test_that("augment returns `new_data`, not the pre-processed version of `new_data`", {
Expand Down