Description
Feature
I want to preface this by saying I'm unsure if this is technically a bug or a feature request.
I think parsnip
could benefit from adding an additional check in set_mode
to ensure that the user does not add an improper mode to a model Specification. For example, when initializing a linear_reg
specification, it will check if mode
is either c("unknown", "regression")
:
library(tidymodels)
# correct specification
mod1 <- linear_reg() %>%
set_engine('lm')
mod1
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#returns an error - lienar_reg cannot be used for classification
mod_error <- linear_reg('classification') %>%
set_engine('lm')
#> Error: `mode` should be one of: 'unknown', 'regression'
However, the user can still do set_mode("classification")
without any warnings or errors.
mod2 <- mod1 %>%
set_mode('classification') #should this be allowed?
mod2
#> Linear Regression Model Specification (classification)
#>
#> Computational engine: lm
As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:
mod2 %>%
fit(Species ~ ., data = iris)
#> Error: formula_ is unknown.
To someone who has just discovered tidymodels
and has followed many online tutorials in which setting your mode with set_mode()
seems to be a common API practice, they would be under the impression that mod2
is a perfectly acceptable model spec.
I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:
check_valid_spec_mode <- function(model, mode) {
spec_modes <- rlang::env_get(get_model_env(), paste0(model, "_modes"))
if (!(mode %in% spec_modes))
rlang::abort(glue::glue("`mode` should be one of: ",
glue::glue_collapse(glue::glue("'{spec_modes}'"),
sep = ", ")))
invisible(TRUE)
}
Incase the reader was curious, I found the source of the downstream error. The"formula_
" comes from a switch
statement in fit.model_spec
, which is attributed to no fit$interface
existing on object$method
. For example, add_method
is called, which then calls get_model_spec
, which is then unable to find a record in linear_reg_fit
of a mode equivalent to "classification" and an engine of "lm".
If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how tidymodels
will not allow you to do a regression model where the outcome is a factor.
mod1 %>%
fit(Species ~ ., data = iris)
#> Error: For a regression model, the outcome should be numeric.
#explicitly change Species to numeric for linear model of a factor.
mod1 %>% fit(as.numeric(Species) ~ ., data = iris)
#> parsnip model object
#>
#> Fit time: 2ms
#>
#> Call:
#> stats::lm(formula = as.numeric(Species) ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1.18650 -0.11191 -0.04008 0.22865 0.60925
If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.