Skip to content

predict(type = "prob") should error if outcome level is named "class" #720

Closed
@simonpcouch

Description

@simonpcouch

predict(type = "prob") and predict(type = "class") result in the same column names if the outcome has a level named "class".

library(parsnip)
library(tibble)

x <- tibble(
  class = factor(sample(c("class", "class_1"), 100, replace = TRUE)),
  a = rnorm(100),
  b = rnorm(100)
)

mod <- logistic_reg() %>%
  set_mode(mode = "classification") %>%
  fit(class ~ a + b, data = x)

predict(mod, type = "class", new_data = x)
#> # A tibble: 100 × 1
#>    .pred_class
#>    <fct>      
#>  1 class_1    
#>  2 class_1    
#>  3 class      
#>  4 class_1    
#>  5 class_1    
#>  6 class      
#>  7 class      
#>  8 class      
#>  9 class      
#> 10 class      
#> # … with 90 more rows

predict(mod, type = "prob", new_data = x)
#> # A tibble: 100 × 2
#>    .pred_class .pred_class_1
#>          <dbl>         <dbl>
#>  1       0.498         0.502
#>  2       0.475         0.525
#>  3       0.556         0.444
#>  4       0.457         0.543
#>  5       0.490         0.510
#>  6       0.520         0.480
#>  7       0.516         0.484
#>  8       0.525         0.475
#>  9       0.550         0.450
#> 10       0.562         0.438
#> # … with 90 more rows

Created on 2022-05-09 by the reprex package (v2.0.1)

Some packages downstream from parsnip join these two tibbles together, resulting in issues like tidymodels/stacks#125 and tidymodels/tune#487.

@DavisVaughan and I spent some time with this this morning, and came to the conclusion that erroring in predict(type = "prob") when an outcome level is named "class" is likely the best route here. Erroring in parsnip, before the predictions are generated, means that downstream packages (tune, stacks, possibly elsewhere) need not anticipate this edge case when joining predictions. This also gives us a chance to raise the same (informative) error any time this issue comes up.

This solution doesn't feel very satisfying. Some alternatives:

These didn't sound very satisfying either.🤷

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions