Skip to content

Commit 74bbf1d

Browse files
robinsoneshadley
authored andcommitted
Improve readme and add COC (#164)
Fixes #148
1 parent a825a5c commit 74bbf1d

File tree

5 files changed

+151
-103
lines changed

5 files changed

+151
-103
lines changed

.github/CODE_OF_CONDUCT.md

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Contributor Code of Conduct
2+
3+
As contributors and maintainers of this project, we pledge to respect all people who
4+
contribute through reporting issues, posting feature requests, updating documentation,
5+
submitting pull requests or patches, and other activities.
6+
7+
We are committed to making participation in this project a harassment-free experience for
8+
everyone, regardless of level of experience, gender, gender identity and expression,
9+
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
10+
11+
Examples of unacceptable behavior by participants include the use of sexual language or
12+
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
13+
insults, or other unprofessional conduct.
14+
15+
Project maintainers have the right and responsibility to remove, edit, or reject comments,
16+
commits, code, wiki edits, issues, and other contributions that are not aligned to this
17+
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
18+
from the project team.
19+
20+
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
21+
opening an issue or contacting one or more of the project maintainers.
22+
23+
This Code of Conduct is adapted from the Contributor Covenant
24+
(http://contributor-covenant.org), version 1.0.0, available at
25+
http://contributor-covenant.org/version/1/0/0/

README.Rmd

+39-19
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ output: github_document
88
knitr::opts_chunk$set(
99
collapse = TRUE,
1010
comment = "#>",
11-
fig.path = "README-"
11+
fig.path = "man/figures/README-"
1212
)
1313
```
1414

@@ -22,13 +22,18 @@ knitr::opts_chunk$set(
2222

2323
## Overview
2424

25-
R uses __factors__ to handle categorical variables, variables that have a fixed and known set of possible values. Historically, factors were much easier to work with than character vectors, so many base R functions automatically convert character vectors to factors. (For historical context, I recommend [_stringsAsFactors: An unauthorized biography_](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng, and [_stringsAsFactors = \<sigh\>_](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [_Wrangling categorical data in R_](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton.) These days, making factors automatically is no longer so helpful, so packages in the [tidyverse](http://tidyverse.org) never create them automatically.
25+
R uses __factors__ to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the __forcats__ package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:
2626

27-
However, factors are still useful when you have true categorical data, and when you want to override the ordering of character vectors to improve display. The goal of the __forcats__ package is to provide a suite of useful tools that solve common problems with factors. If you're not familiar with strings, the best place to start is the [chapter on factors](http://r4ds.had.co.nz/factors.html) in R for Data Science.
27+
* `fct_reorder()`: Reordering a factor by another variable.
28+
* `fct_infreq()`: Reordering a factor by the frequency of values.
29+
* `fct_relevel()`: Changing the order of a factor by hand.
30+
* `fct_lump()`: Collapsing the least/most frequent values of a factor into "other".
31+
32+
You can learn more about each of these in `vignette("forcats")`. If you're new to factors, the best place to start is the [chapter on factors](http://r4ds.had.co.nz/factors.html) in R for Data Science.
2833

2934
## Installation
3035

31-
```R
36+
```
3237
# The easiest way to get forcats is to install the whole tidyverse:
3338
install.packages("tidyverse")
3439
@@ -46,30 +51,45 @@ forcats is part of the core tidyverse, so you can load it with `library(tidyvers
4651

4752
```{r setup, message = FALSE}
4853
library(forcats)
54+
library(dplyr)
55+
library(ggplot2)
4956
```
5057

51-
Factors are used to describe categorical variables with a fixed and known set of __levels__. You can create factors with the base `factor()` or [`readr::parse_factor()`](http://readr.tidyverse.org/reference/parse_factor.html):
58+
```{r}
59+
starwars %>%
60+
filter(!is.na(species)) %>%
61+
count(species, sort = TRUE)
62+
```
5263

5364
```{r}
54-
x1 <- c("Dec", "Apr", "Jan", "Mar")
55-
month_levels <- c(
56-
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
57-
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
58-
)
65+
starwars %>%
66+
filter(!is.na(species)) %>%
67+
mutate(species = fct_lump(species, n = 3)) %>%
68+
count(species)
69+
```
5970

60-
factor(x1, month_levels)
71+
```{r unordered-plot}
72+
ggplot(starwars, aes(x = eye_color)) +
73+
geom_bar() +
74+
coord_flip()
75+
```
6176

62-
readr::parse_factor(x1, month_levels)
77+
```{r ordered-plot}
78+
starwars %>%
79+
mutate(eye_color = fct_infreq(eye_color)) %>%
80+
ggplot(aes(x = eye_color)) +
81+
geom_bar() +
82+
coord_flip()
6383
```
6484

65-
The advantage of `parse_factor()` is that it will generate a warning if values of `x` are not valid levels:
85+
## More resources
6686

67-
```{r}
68-
x2 <- c("Dec", "Apr", "Jam", "Mar")
87+
For a history of factors, I recommend [_stringsAsFactors: An unauthorized biography_](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [_stringsAsFactors = \<sigh\>_](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [_Wrangling categorical data in R_](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton.
6988

70-
factor(x2, month_levels)
89+
## Getting help
7190

72-
readr::parse_factor(x2, month_levels)
73-
```
91+
If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/tidyverse/forcats/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/).
92+
93+
## Code of Conduct
7494

75-
Once you have the factor, forcats provides helpers for solving common problems.
95+
Please note that the 'forcats' project is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.

README.md

+87-84
Original file line numberDiff line numberDiff line change
@@ -1,107 +1,110 @@
11

22
<!-- README.md is generated from README.Rmd. Please edit that file -->
3-
4-
# forcats <img src='man/figures/logo.png' align="right" height="139" />
3+
forcats <img src='man/figures/logo.png' align="right" height="139" />
4+
=====================================================================
55

66
<!-- badges: start -->
7+
[![CRAN status](https://www.r-pkg.org/badges/version/forcats)](https://cran.r-project.org/package=forcats) [![Travis build status](https://travis-ci.org/tidyverse/forcats.svg?branch=master)](https://travis-ci.org/tidyverse/forcats) [![Codecov test coverage](https://codecov.io/gh/tidyverse/forcats/branch/master/graph/badge.svg)](https://codecov.io/gh/tidyverse/forcats?branch=master) <!-- badges: end -->
78

8-
[![CRAN
9-
status](https://www.r-pkg.org/badges/version/forcats)](https://cran.r-project.org/package=forcats)
10-
[![Travis build
11-
status](https://travis-ci.org/tidyverse/forcats.svg?branch=master)](https://travis-ci.org/tidyverse/forcats)
12-
[![Codecov test
13-
coverage](https://codecov.io/gh/tidyverse/forcats/branch/master/graph/badge.svg)](https://codecov.io/gh/tidyverse/forcats?branch=master)
14-
<!-- badges: end -->
15-
16-
## Overview
17-
18-
R uses **factors** to handle categorical variables, variables that have
19-
a fixed and known set of possible values. Historically, factors were
20-
much easier to work with than character vectors, so many base R
21-
functions automatically convert character vectors to factors. (For
22-
historical context, I recommend [*stringsAsFactors: An unauthorized
23-
biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/)
24-
by Roger Peng, and [*stringsAsFactors =
25-
\<sigh\>*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh)
26-
by Thomas Lumley. If you want to learn more about other approaches to
27-
working with factors and categorical data, I recommend [*Wrangling
28-
categorical data in R*](https://peerj.com/preprints/3163/), by Amelia
29-
McNamara and Nicholas Horton.) These days, making factors automatically
30-
is no longer so helpful, so packages in the
31-
[tidyverse](http://tidyverse.org) never create them automatically.
32-
33-
However, factors are still useful when you have true categorical data,
34-
and when you want to override the ordering of character vectors to
35-
improve display. The goal of the **forcats** package is to provide a
36-
suite of useful tools that solve common problems with factors. If you’re
37-
not familiar with strings, the best place to start is the [chapter on
38-
factors](http://r4ds.had.co.nz/factors.html) in R for Data Science.
39-
40-
## Installation
9+
Overview
10+
--------
4111

42-
``` r
43-
# The easiest way to get forcats is to install the whole tidyverse:
44-
install.packages("tidyverse")
12+
R uses **factors** to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the **forcats** package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:
4513

46-
# Alternatively, install just forcats:
47-
install.packages("forcats")
14+
- `fct_reorder()`: Reordering a factor by another variable.
15+
- `fct_infreq()`: Reordering a factor by the frequency of values.
16+
- `fct_relevel()`: Changing the order of a factor by hand.
17+
- `fct_lump()`: Collapsing the least/most frequent values of a factor into "other".
4818

49-
# Or the the development version from GitHub:
50-
# install.packages("devtools")
51-
devtools::install_github("tidyverse/forcats")
52-
```
19+
You can learn more about each of these in `vignette("forcats")`. If you're new to factors, the best place to start is the [chapter on factors](http://r4ds.had.co.nz/factors.html) in R for Data Science.
20+
21+
Installation
22+
------------
23+
24+
# The easiest way to get forcats is to install the whole tidyverse:
25+
install.packages("tidyverse")
5326

54-
## Getting started
27+
# Alternatively, install just forcats:
28+
install.packages("forcats")
5529

56-
forcats is part of the core tidyverse, so you can load it with
57-
`library(tidyverse)` or `library(forcats)`.
30+
# Or the the development version from GitHub:
31+
# install.packages("devtools")
32+
devtools::install_github("tidyverse/forcats")
33+
34+
Getting started
35+
---------------
36+
37+
forcats is part of the core tidyverse, so you can load it with `library(tidyverse)` or `library(forcats)`.
5838

5939
``` r
6040
library(forcats)
41+
library(dplyr)
42+
library(ggplot2)
43+
```
44+
45+
``` r
46+
starwars %>%
47+
filter(!is.na(species)) %>%
48+
count(species, sort = TRUE)
49+
#> # A tibble: 37 x 2
50+
#> species n
51+
#> <chr> <int>
52+
#> 1 Human 35
53+
#> 2 Droid 5
54+
#> 3 Gungan 3
55+
#> 4 Kaminoan 2
56+
#> 5 Mirialan 2
57+
#> 6 Twi'lek 2
58+
#> 7 Wookiee 2
59+
#> 8 Zabrak 2
60+
#> 9 Aleena 1
61+
#> 10 Besalisk 1
62+
#> # … with 27 more rows
6163
```
6264

63-
Factors are used to describe categorical variables with a fixed and
64-
known set of **levels**. You can create factors with the base `factor()`
65-
or
66-
[`readr::parse_factor()`](http://readr.tidyverse.org/reference/parse_factor.html):
65+
``` r
66+
starwars %>%
67+
filter(!is.na(species)) %>%
68+
mutate(species = fct_lump(species, n = 3)) %>%
69+
count(species)
70+
#> # A tibble: 4 x 2
71+
#> species n
72+
#> <fct> <int>
73+
#> 1 Droid 5
74+
#> 2 Gungan 3
75+
#> 3 Human 35
76+
#> 4 Other 39
77+
```
6778

6879
``` r
69-
x1 <- c("Dec", "Apr", "Jan", "Mar")
70-
month_levels <- c(
71-
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
72-
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
73-
)
74-
75-
factor(x1, month_levels)
76-
#> [1] Dec Apr Jan Mar
77-
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
78-
79-
readr::parse_factor(x1, month_levels)
80-
#> [1] Dec Apr Jan Mar
81-
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
80+
ggplot(starwars, aes(x = eye_color)) +
81+
geom_bar() +
82+
coord_flip()
8283
```
8384

84-
The advantage of `parse_factor()` is that it will generate a warning if
85-
values of `x` are not valid levels:
85+
![](man/figures/README-unordered-plot-1.png)
8686

8787
``` r
88-
x2 <- c("Dec", "Apr", "Jam", "Mar")
89-
90-
factor(x2, month_levels)
91-
#> [1] Dec Apr <NA> Mar
92-
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
93-
94-
readr::parse_factor(x2, month_levels)
95-
#> Warning: 1 parsing failure.
96-
#> row # A tibble: 1 x 4 col row col expected actual expected <int> <int> <chr> <chr> actual 1 3 NA value in level set Jam
97-
#> [1] Dec Apr <NA> Mar
98-
#> attr(,"problems")
99-
#> # A tibble: 1 x 4
100-
#> row col expected actual
101-
#> <int> <int> <chr> <chr>
102-
#> 1 3 NA value in level set Jam
103-
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
88+
starwars %>%
89+
mutate(eye_color = fct_infreq(eye_color)) %>%
90+
ggplot(aes(x = eye_color)) +
91+
geom_bar() +
92+
coord_flip()
10493
```
10594

106-
Once you have the factor, forcats provides helpers for solving common
107-
problems.
95+
![](man/figures/README-ordered-plot-1.png)
96+
97+
More resources
98+
--------------
99+
100+
For a history of factors, I recommend [*stringsAsFactors: An unauthorized biography*](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [*stringsAsFactors = &lt;sigh&gt;*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [*Wrangling categorical data in R*](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton.
101+
102+
Getting help
103+
------------
104+
105+
If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/tidyverse/forcats/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/).
106+
107+
Code of Conduct
108+
---------------
109+
110+
Please note that the 'forcats' project is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.

man/figures/README-ordered-plot-1.png

27.3 KB
Loading
27.3 KB
Loading

0 commit comments

Comments
 (0)