Skip to content

table_or() doesn't seem to sort categorical predictors as expected #46

Open
@craig-parylo

Description

@craig-parylo

I noticed the plotor::table_or() function doesn't output multi-level factor predictors in the correct order.

# data with separation for pred1
df_separated <- tibble::tibble(
  outcome = sample(0:1, size = 1000, replace = TRUE, prob = c(0.2,0.8)) |>
    factor(levels = c(0,1), labels = c('Fail', 'Success')),
  pred1 = dplyr::if_else(
    condition = outcome == 'Fail',
    true = sample(0:2, size = 1000, replace = TRUE),
    false = sample(1:3, size = 1000, replace = TRUE)
  ) |>
    factor(levels = c(0, 1, 2, 3), labels = c('red', 'green', 'brown', 'blue')) |>
    forcats::fct_infreq(),
  pred2 = rpois(n = 1000, lambda = 10)
)

# see the separation
table(df_separated$outcome, df_separated$pred1)

# model this
lr_separated <- stats::glm(
  data = df_separated,
  formula = outcome ~ pred1 + pred2,
  family = 'binomial'
)

# run a {plotor} function
plotor::table_or(lr_separated)

# output
# A tibble: 5 × 14
  label level  rows outcome outcome_rate class    estimate std.error statistic p.value   conf.low conf.high significance    comparator
  <fct> <chr> <int>   <int>        <dbl> <chr>       <dbl>     <dbl>     <dbl>   <dbl>      <dbl>     <dbl> <chr>                <dbl>
1 pred1 blue    275     275        1     factor   8.67e+ 7  648.        0.0282   0.977  1.28e+ 21  3.49e119 Significant             NA
2 pred1 brown   298     235        0.789 factor   1.02e+ 0    0.193     0.114    0.909  7.01e-  1  1.49e  0 Not significant         NA
3 pred1 green   348     273        0.784 factor  NA          NA        NA       NA     NA         NA        Comparator               1
4 pred1 red      79       0        0     factor   8.63e-10 1209.       -0.0173   0.986  3.87e-204  3.03e  9 Not significant         NA
5 pred2 pred2  1000     783        0.783 integer  1.03e+ 0    0.0307    0.921    0.357  9.69e-  1  1.09e  0 Not significant         NA

pred1 doesn't seem to be ordered by frequency, as specified in the data definition.

It could be connected to 'level' being a character variable instead of a factor. This may need further exploration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions