`predict(type = "prob")` should error if outcome level is named `"class"`

`predict(type = "prob")` and `predict(type = "class")` result in the same column names if the outcome has a level named `"class"`.

``` r
library(parsnip)
library(tibble)

x <- tibble(
  class = factor(sample(c("class", "class_1"), 100, replace = TRUE)),
  a = rnorm(100),
  b = rnorm(100)
)

mod <- logistic_reg() %>%
  set_mode(mode = "classification") %>%
  fit(class ~ a + b, data = x)

predict(mod, type = "class", new_data = x)
#> # A tibble: 100 × 1
#>    .pred_class
#>    <fct>      
#>  1 class_1    
#>  2 class_1    
#>  3 class      
#>  4 class_1    
#>  5 class_1    
#>  6 class      
#>  7 class      
#>  8 class      
#>  9 class      
#> 10 class      
#> # … with 90 more rows

predict(mod, type = "prob", new_data = x)
#> # A tibble: 100 × 2
#>    .pred_class .pred_class_1
#>          <dbl>         <dbl>
#>  1       0.498         0.502
#>  2       0.475         0.525
#>  3       0.556         0.444
#>  4       0.457         0.543
#>  5       0.490         0.510
#>  6       0.520         0.480
#>  7       0.516         0.484
#>  8       0.525         0.475
#>  9       0.550         0.450
#> 10       0.562         0.438
#> # … with 90 more rows
```

<sup>Created on 2022-05-09 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

Some packages downstream from parsnip join these two tibbles together, resulting in issues like https://github.com/tidymodels/stacks/issues/125 and https://github.com/tidymodels/tune/issues/487.

@DavisVaughan and I spent some time with this this morning, and came to the conclusion that erroring in `predict(type = "prob")` when an outcome level is named `"class"` is likely the best route here. Erroring in parsnip, before the predictions are generated, means that downstream packages (tune, stacks, possibly elsewhere) need not anticipate this edge case when joining predictions. This also gives us a chance to raise the same (informative) error any time this issue comes up.

This solution doesn't feel very satisfying. Some alternatives:

* changing the column name at `predict(type = "prob")` in this case, e.g. generating `.pred_class___`
* handling these edge cases later on, a la https://github.com/tidymodels/stacks/pull/126

These didn't sound very satisfying either.🤷


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`predict(type = "prob")` should error if outcome level is named `"class"` #720

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

predict(type = "prob") should error if outcome level is named "class" #720

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`predict(type = "prob")` should error if outcome level is named `"class"` #720