Skip to content

fct_collapse applying collapsed factors in wrong order when group_other = TRUE #172

Closed
@gtm19

Description

@gtm19

I'm having an issue which is self explanatory from the title. Reprex below. As you can see, the classification produced by fct_collapse is incorrect.

As requested, a simpler set of examples is below.

Example 1 - Collapsing Character Variables

The fct_collapse function wrongly classifies raspberry as a vegetable, rather than broccoli.

library(forcats)
library(dplyr)
library(tibble)

df <-
  tibble(item = c("apple", "grape", "banana", "broccoli", "raspberry"))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <chr>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  fruit     
#> 5 raspberry vegetables

Example 2 - Collapsing _Factor_Variables

For clarity, and contrary to my previous explanation, this bug affects both character and factor variables (since fct_collapse converts the .f argument to a factor if it is a character variable):

df <-
  tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <fct>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  fruit     
#> 5 raspberry vegetables

Example 3 - with group_other argument set to FALSE

Just to demonstrate that this bug does not occur when group_other is set to FALSE:

df <-
  tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = FALSE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <fct>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  vegetables
#> 5 raspberry fruit

Session Info

sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] bindrcpp_0.2.2     tibble_2.0.1       dplyr_0.7.8       
#> [4] forcats_0.4.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.0       knitr_1.21       bindr_0.1.1      magrittr_1.5    
#>  [5] tidyselect_0.2.5 R6_2.3.0         rlang_0.3.1      fansi_0.4.0     
#>  [9] stringr_1.4.0    highr_0.7        tools_3.5.2      xfun_0.4        
#> [13] utf8_1.1.4       cli_1.0.1        htmltools_0.3.6  yaml_2.2.0      
#> [17] digest_0.6.18    assertthat_0.2.0 crayon_1.3.4     purrr_0.3.0     
#> [21] glue_1.3.0       evaluate_0.12    rmarkdown_1.11   stringi_1.2.4   
#> [25] compiler_3.5.2   pillar_1.3.1     pkgconfig_2.0.2

Created on 2019-02-17 by the reprex package (v0.2.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions