Skip to content

Additional checks in set_mode prevent improper use of model specs #467

Closed
@jtlandis

Description

@jtlandis

Feature

I want to preface this by saying I'm unsure if this is technically a bug or a feature request.

I think parsnip could benefit from adding an additional check in set_mode to ensure that the user does not add an improper mode to a model Specification. For example, when initializing a linear_reg specification, it will check if mode is either c("unknown", "regression"):

library(tidymodels)

# correct specification
mod1 <- linear_reg() %>%
  set_engine('lm') 

mod1
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

#returns an error - lienar_reg cannot be used for classification
mod_error <- linear_reg('classification') %>%
  set_engine('lm')
#> Error: `mode` should be one of: 'unknown', 'regression'

However, the user can still do set_mode("classification") without any warnings or errors.

mod2 <- mod1 %>%
  set_mode('classification') #should this be allowed?

mod2
#> Linear Regression Model Specification (classification)
#> 
#> Computational engine: lm

As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:

mod2 %>%
  fit(Species ~ ., data = iris)
#> Error: formula_ is unknown.

To someone who has just discovered tidymodels and has followed many online tutorials in which setting your mode with set_mode() seems to be a common API practice, they would be under the impression that mod2 is a perfectly acceptable model spec.

I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:

check_valid_spec_mode <- function(model, mode) {
    spec_modes <- rlang::env_get(get_model_env(), paste0(model, "_modes"))
        if (!(mode %in% spec_modes)) 
            rlang::abort(glue::glue("`mode` should be one of: ", 
                glue::glue_collapse(glue::glue("'{spec_modes}'"), 
                    sep = ", ")))
    invisible(TRUE)
}

Incase the reader was curious, I found the source of the downstream error. The"formula_" comes from a switch statement in fit.model_spec, which is attributed to no fit$interface existing on object$method. For example, add_method is called, which then calls get_model_spec, which is then unable to find a record in linear_reg_fit of a mode equivalent to "classification" and an engine of "lm".

If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how tidymodels will not allow you to do a regression model where the outcome is a factor.

mod1 %>%
  fit(Species ~ ., data = iris)
#> Error: For a regression model, the outcome should be numeric.

#explicitly change Species to numeric for linear model of a factor.

mod1 %>% fit(as.numeric(Species) ~ ., data = iris)
#> parsnip model object
#> 
#> Fit time:  2ms 
#> 
#> Call:
#> stats::lm(formula = as.numeric(Species) ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width  
#>      1.18650      -0.11191      -0.04008       0.22865       0.60925  

If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions