Closed
Description
I'm having an issue which is self explanatory from the title. Reprex below. As you can see, the classification produced by fct_collapse
is incorrect.
As requested, a simpler set of examples is below.
Example 1 - Collapsing Character Variables
The fct_collapse
function wrongly classifies raspberry as a vegetable, rather than broccoli.
library(forcats)
library(dplyr)
library(tibble)
df <-
tibble(item = c("apple", "grape", "banana", "broccoli", "raspberry"))
df %>%
mutate(category = fct_collapse(item,
fruit = c("apple", "grape", "banana", "raspberry"),
vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#> item category
#> <chr> <fct>
#> 1 apple fruit
#> 2 grape fruit
#> 3 banana fruit
#> 4 broccoli fruit
#> 5 raspberry vegetables
Example 2 - Collapsing _Factor_Variables
For clarity, and contrary to my previous explanation, this bug affects both character and factor variables (since fct_collapse
converts the .f
argument to a factor if it is a character variable):
df <-
tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))
df %>%
mutate(category = fct_collapse(item,
fruit = c("apple", "grape", "banana", "raspberry"),
vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#> item category
#> <fct> <fct>
#> 1 apple fruit
#> 2 grape fruit
#> 3 banana fruit
#> 4 broccoli fruit
#> 5 raspberry vegetables
Example 3 - with group_other
argument set to FALSE
Just to demonstrate that this bug does not occur when group_other
is set to FALSE
:
df <-
tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))
df %>%
mutate(category = fct_collapse(item,
fruit = c("apple", "grape", "banana", "raspberry"),
vegetables = "broccoli", group_other = FALSE))
#> # A tibble: 5 x 2
#> item category
#> <fct> <fct>
#> 1 apple fruit
#> 2 grape fruit
#> 3 banana fruit
#> 4 broccoli vegetables
#> 5 raspberry fruit
Session Info
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2.2 tibble_2.0.1 dplyr_0.7.8
#> [4] forcats_0.4.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.0 knitr_1.21 bindr_0.1.1 magrittr_1.5
#> [5] tidyselect_0.2.5 R6_2.3.0 rlang_0.3.1 fansi_0.4.0
#> [9] stringr_1.4.0 highr_0.7 tools_3.5.2 xfun_0.4
#> [13] utf8_1.1.4 cli_1.0.1 htmltools_0.3.6 yaml_2.2.0
#> [17] digest_0.6.18 assertthat_0.2.0 crayon_1.3.4 purrr_0.3.0
#> [21] glue_1.3.0 evaluate_0.12 rmarkdown_1.11 stringi_1.2.4
#> [25] compiler_3.5.2 pillar_1.3.1 pkgconfig_2.0.2
Created on 2019-02-17 by the reprex package (v0.2.1)