Open
Description
I noticed the plotor::table_or()
function doesn't output multi-level factor predictors in the correct order.
# data with separation for pred1
df_separated <- tibble::tibble(
outcome = sample(0:1, size = 1000, replace = TRUE, prob = c(0.2,0.8)) |>
factor(levels = c(0,1), labels = c('Fail', 'Success')),
pred1 = dplyr::if_else(
condition = outcome == 'Fail',
true = sample(0:2, size = 1000, replace = TRUE),
false = sample(1:3, size = 1000, replace = TRUE)
) |>
factor(levels = c(0, 1, 2, 3), labels = c('red', 'green', 'brown', 'blue')) |>
forcats::fct_infreq(),
pred2 = rpois(n = 1000, lambda = 10)
)
# see the separation
table(df_separated$outcome, df_separated$pred1)
# model this
lr_separated <- stats::glm(
data = df_separated,
formula = outcome ~ pred1 + pred2,
family = 'binomial'
)
# run a {plotor} function
plotor::table_or(lr_separated)
# output
# A tibble: 5 × 14
label level rows outcome outcome_rate class estimate std.error statistic p.value conf.low conf.high significance comparator
<fct> <chr> <int> <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 pred1 blue 275 275 1 factor 8.67e+ 7 648. 0.0282 0.977 1.28e+ 21 3.49e119 Significant NA
2 pred1 brown 298 235 0.789 factor 1.02e+ 0 0.193 0.114 0.909 7.01e- 1 1.49e 0 Not significant NA
3 pred1 green 348 273 0.784 factor NA NA NA NA NA NA Comparator 1
4 pred1 red 79 0 0 factor 8.63e-10 1209. -0.0173 0.986 3.87e-204 3.03e 9 Not significant NA
5 pred2 pred2 1000 783 0.783 integer 1.03e+ 0 0.0307 0.921 0.357 9.69e- 1 1.09e 0 Not significant NA
pred1
doesn't seem to be ordered by frequency, as specified in the data definition.
It could be connected to 'level' being a character variable instead of a factor. This may need further exploration.