|
1 | 1 |
|
2 | 2 | <!-- README.md is generated from README.Rmd. Please edit that file -->
|
3 |
| - |
4 |
| -# forcats <img src='man/figures/logo.png' align="right" height="139" /> |
| 3 | +forcats <img src='man/figures/logo.png' align="right" height="139" /> |
| 4 | +===================================================================== |
5 | 5 |
|
6 | 6 | <!-- badges: start -->
|
| 7 | +[](https://cran.r-project.org/package=forcats) [](https://travis-ci.org/tidyverse/forcats) [](https://codecov.io/gh/tidyverse/forcats?branch=master) <!-- badges: end --> |
7 | 8 |
|
8 |
| -[](https://cran.r-project.org/package=forcats) |
10 |
| -[](https://travis-ci.org/tidyverse/forcats) |
12 |
| -[](https://codecov.io/gh/tidyverse/forcats?branch=master) |
14 |
| -<!-- badges: end --> |
15 |
| - |
16 |
| -## Overview |
17 |
| - |
18 |
| -R uses **factors** to handle categorical variables, variables that have |
19 |
| -a fixed and known set of possible values. Historically, factors were |
20 |
| -much easier to work with than character vectors, so many base R |
21 |
| -functions automatically convert character vectors to factors. (For |
22 |
| -historical context, I recommend [*stringsAsFactors: An unauthorized |
23 |
| -biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) |
24 |
| -by Roger Peng, and [*stringsAsFactors = |
25 |
| -\<sigh\>*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) |
26 |
| -by Thomas Lumley. If you want to learn more about other approaches to |
27 |
| -working with factors and categorical data, I recommend [*Wrangling |
28 |
| -categorical data in R*](https://peerj.com/preprints/3163/), by Amelia |
29 |
| -McNamara and Nicholas Horton.) These days, making factors automatically |
30 |
| -is no longer so helpful, so packages in the |
31 |
| -[tidyverse](http://tidyverse.org) never create them automatically. |
32 |
| - |
33 |
| -However, factors are still useful when you have true categorical data, |
34 |
| -and when you want to override the ordering of character vectors to |
35 |
| -improve display. The goal of the **forcats** package is to provide a |
36 |
| -suite of useful tools that solve common problems with factors. If you’re |
37 |
| -not familiar with strings, the best place to start is the [chapter on |
38 |
| -factors](http://r4ds.had.co.nz/factors.html) in R for Data Science. |
39 |
| - |
40 |
| -## Installation |
| 9 | +Overview |
| 10 | +-------- |
41 | 11 |
|
42 |
| -``` r |
43 |
| -# The easiest way to get forcats is to install the whole tidyverse: |
44 |
| -install.packages("tidyverse") |
| 12 | +R uses **factors** to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the **forcats** package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include: |
45 | 13 |
|
46 |
| -# Alternatively, install just forcats: |
47 |
| -install.packages("forcats") |
| 14 | +- `fct_reorder()`: Reordering a factor by another variable. |
| 15 | +- `fct_infreq()`: Reordering a factor by the frequency of values. |
| 16 | +- `fct_relevel()`: Changing the order of a factor by hand. |
| 17 | +- `fct_lump()`: Collapsing the least/most frequent values of a factor into "other". |
48 | 18 |
|
49 |
| -# Or the the development version from GitHub: |
50 |
| -# install.packages("devtools") |
51 |
| -devtools::install_github("tidyverse/forcats") |
52 |
| -``` |
| 19 | +You can learn more about each of these in `vignette("forcats")`. If you're new to factors, the best place to start is the [chapter on factors](http://r4ds.had.co.nz/factors.html) in R for Data Science. |
| 20 | + |
| 21 | +Installation |
| 22 | +------------ |
| 23 | + |
| 24 | + # The easiest way to get forcats is to install the whole tidyverse: |
| 25 | + install.packages("tidyverse") |
53 | 26 |
|
54 |
| -## Getting started |
| 27 | + # Alternatively, install just forcats: |
| 28 | + install.packages("forcats") |
55 | 29 |
|
56 |
| -forcats is part of the core tidyverse, so you can load it with |
57 |
| -`library(tidyverse)` or `library(forcats)`. |
| 30 | + # Or the the development version from GitHub: |
| 31 | + # install.packages("devtools") |
| 32 | + devtools::install_github("tidyverse/forcats") |
| 33 | + |
| 34 | +Getting started |
| 35 | +--------------- |
| 36 | + |
| 37 | +forcats is part of the core tidyverse, so you can load it with `library(tidyverse)` or `library(forcats)`. |
58 | 38 |
|
59 | 39 | ``` r
|
60 | 40 | library(forcats)
|
| 41 | +library(dplyr) |
| 42 | +library(ggplot2) |
| 43 | +``` |
| 44 | + |
| 45 | +``` r |
| 46 | +starwars %>% |
| 47 | + filter(!is.na(species)) %>% |
| 48 | + count(species, sort = TRUE) |
| 49 | +#> # A tibble: 37 x 2 |
| 50 | +#> species n |
| 51 | +#> <chr> <int> |
| 52 | +#> 1 Human 35 |
| 53 | +#> 2 Droid 5 |
| 54 | +#> 3 Gungan 3 |
| 55 | +#> 4 Kaminoan 2 |
| 56 | +#> 5 Mirialan 2 |
| 57 | +#> 6 Twi'lek 2 |
| 58 | +#> 7 Wookiee 2 |
| 59 | +#> 8 Zabrak 2 |
| 60 | +#> 9 Aleena 1 |
| 61 | +#> 10 Besalisk 1 |
| 62 | +#> # … with 27 more rows |
61 | 63 | ```
|
62 | 64 |
|
63 |
| -Factors are used to describe categorical variables with a fixed and |
64 |
| -known set of **levels**. You can create factors with the base `factor()` |
65 |
| -or |
66 |
| -[`readr::parse_factor()`](http://readr.tidyverse.org/reference/parse_factor.html): |
| 65 | +``` r |
| 66 | +starwars %>% |
| 67 | + filter(!is.na(species)) %>% |
| 68 | + mutate(species = fct_lump(species, n = 3)) %>% |
| 69 | + count(species) |
| 70 | +#> # A tibble: 4 x 2 |
| 71 | +#> species n |
| 72 | +#> <fct> <int> |
| 73 | +#> 1 Droid 5 |
| 74 | +#> 2 Gungan 3 |
| 75 | +#> 3 Human 35 |
| 76 | +#> 4 Other 39 |
| 77 | +``` |
67 | 78 |
|
68 | 79 | ``` r
|
69 |
| -x1 <- c("Dec", "Apr", "Jan", "Mar") |
70 |
| -month_levels <- c( |
71 |
| - "Jan", "Feb", "Mar", "Apr", "May", "Jun", |
72 |
| - "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" |
73 |
| -) |
74 |
| - |
75 |
| -factor(x1, month_levels) |
76 |
| -#> [1] Dec Apr Jan Mar |
77 |
| -#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec |
78 |
| - |
79 |
| -readr::parse_factor(x1, month_levels) |
80 |
| -#> [1] Dec Apr Jan Mar |
81 |
| -#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec |
| 80 | +ggplot(starwars, aes(x = eye_color)) + |
| 81 | + geom_bar() + |
| 82 | + coord_flip() |
82 | 83 | ```
|
83 | 84 |
|
84 |
| -The advantage of `parse_factor()` is that it will generate a warning if |
85 |
| -values of `x` are not valid levels: |
| 85 | + |
86 | 86 |
|
87 | 87 | ``` r
|
88 |
| -x2 <- c("Dec", "Apr", "Jam", "Mar") |
89 |
| - |
90 |
| -factor(x2, month_levels) |
91 |
| -#> [1] Dec Apr <NA> Mar |
92 |
| -#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec |
93 |
| - |
94 |
| -readr::parse_factor(x2, month_levels) |
95 |
| -#> Warning: 1 parsing failure. |
96 |
| -#> row # A tibble: 1 x 4 col row col expected actual expected <int> <int> <chr> <chr> actual 1 3 NA value in level set Jam |
97 |
| -#> [1] Dec Apr <NA> Mar |
98 |
| -#> attr(,"problems") |
99 |
| -#> # A tibble: 1 x 4 |
100 |
| -#> row col expected actual |
101 |
| -#> <int> <int> <chr> <chr> |
102 |
| -#> 1 3 NA value in level set Jam |
103 |
| -#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec |
| 88 | +starwars %>% |
| 89 | + mutate(eye_color = fct_infreq(eye_color)) %>% |
| 90 | + ggplot(aes(x = eye_color)) + |
| 91 | + geom_bar() + |
| 92 | + coord_flip() |
104 | 93 | ```
|
105 | 94 |
|
106 |
| -Once you have the factor, forcats provides helpers for solving common |
107 |
| -problems. |
| 95 | + |
| 96 | + |
| 97 | +More resources |
| 98 | +-------------- |
| 99 | + |
| 100 | +For a history of factors, I recommend [*stringsAsFactors: An unauthorized biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [*stringsAsFactors = <sigh>*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [*Wrangling categorical data in R*](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton. |
| 101 | + |
| 102 | +Getting help |
| 103 | +------------ |
| 104 | + |
| 105 | +If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/tidyverse/forcats/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/). |
| 106 | + |
| 107 | +Code of Conduct |
| 108 | +--------------- |
| 109 | + |
| 110 | +Please note that the 'forcats' project is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms. |
0 commit comments