Additional checks in `set_mode` prevent improper use of model specs


## Feature

I want to preface this by saying I'm unsure if this is technically a bug or a feature request. 

I think `parsnip` could benefit from adding an additional check in `set_mode` to ensure that the user does not add an improper mode to a model Specification. For example, when initializing a `linear_reg` specification, it will check if `mode` is either `c("unknown", "regression")`:

``` r
library(tidymodels)

# correct specification
mod1 <- linear_reg() %>%
  set_engine('lm') 

mod1
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

#returns an error - lienar_reg cannot be used for classification
mod_error <- linear_reg('classification') %>%
  set_engine('lm')
#> Error: `mode` should be one of: 'unknown', 'regression'
```

However, the user can still do `set_mode("classification")` without any warnings or errors.

```r
mod2 <- mod1 %>%
  set_mode('classification') #should this be allowed?

mod2
#> Linear Regression Model Specification (classification)
#> 
#> Computational engine: lm

```

As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:

```r

mod2 %>%
  fit(Species ~ ., data = iris)
#> Error: formula_ is unknown.

```
To someone who has just discovered `tidymodels` and has followed many online tutorials in which setting your mode with `set_mode()` seems to be a common API practice, they would be under the impression that `mod2` is a perfectly acceptable model spec. 

I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:

```r
check_valid_spec_mode <- function(model, mode) {
    spec_modes <- rlang::env_get(get_model_env(), paste0(model, "_modes"))
        if (!(mode %in% spec_modes)) 
            rlang::abort(glue::glue("`mode` should be one of: ", 
                glue::glue_collapse(glue::glue("'{spec_modes}'"), 
                    sep = ", ")))
    invisible(TRUE)
}
```
Incase the reader was curious, I found the source of the downstream error. The"`formula_`" comes from a `switch` statement in `fit.model_spec`, which is attributed to no `fit$interface` existing on `object$method`. For example, `add_method` is called, which then calls `get_model_spec`, which is then unable to find a record in `linear_reg_fit` of a mode equivalent to "classification" and an engine of "lm".

If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how `tidymodels` will not allow you to do a regression model where the outcome is a factor. 

```r
mod1 %>%
  fit(Species ~ ., data = iris)
#> Error: For a regression model, the outcome should be numeric.

#explicitly change Species to numeric for linear model of a factor.

mod1 %>% fit(as.numeric(Species) ~ ., data = iris)
#> parsnip model object
#> 
#> Fit time:  2ms 
#> 
#> Call:
#> stats::lm(formula = as.numeric(Species) ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width  
#>      1.18650      -0.11191      -0.04008       0.22865       0.60925  
```

If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional checks in `set_mode` prevent improper use of model specs #467

Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Additional checks in set_mode prevent improper use of model specs #467

Description

Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Additional checks in `set_mode` prevent improper use of model specs #467