-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
103 lines (66 loc) · 4.48 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
output: github_document
---
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/predictrace)](https://cran.r-project.org/package=predictrace)
[![AppVeyor Build
Status](https://ci.appveyor.com/api/projects/status/github/jacobkap/predictrace?branch=master&svg=true)](https://ci.appveyor.com/project/jacobkap/predictrace)
[![Coverage
status](https://codecov.io/gh/jacobkap/predictrace/branch/master/graph/badge.svg)](https://app.codecov.io/github/jacobkap/predictrace?branch=master)
[![](http://cranlogs.r-pkg.org/badges/grand-total/predictrace?color=blue)](https://cran.r-project.org/package=predictrace)
## Overview
The goal of `predictrace` is to predict the race of a surname or first name and the gender of a first name. This package uses U.S. Census data which says how many people of each race has a certain surname. For first name data, this package uses data from Tzioumis (2018). From this we can predict which race is mostly likely to have that surname or first name. The possible races are American Indian, Asian, Black, Hispanic, White, or two or more races. For the gender of first names, this package uses data from the United States Social Security Administration (SSA) that tells how many people of a given name are female and how many are male (no other genders are included). I use this to determine the proportion of each gender a name is, and use the gender with the higher proportion as the most likely gender for that name. Please note that the Census data on the race of first names is far smaller than the SSA data on the gender of first names, so you will match far fewer first names to race than to gender.
Full citation for the Tzioumis data: Tzioumis, Konstantinos (2018) Demographic aspects of first names, Scientific Data, 5:180025 [dx.doi.org/10.1038/sdata.2018.25].
## Installation
```{r, eval = FALSE}
To install this package, use the code
install.packages("predictrace")
# The development version is available on Github.
# install.packages("devtools")
devtools::install_github("jacobkap/predictrace")
```
## Usage
```{r}
library(predictrace)
```
### Race of a surname
To get the race of a surname, the only required parameter in `predict_race()` is `name` which is the surname you want to find the race of. Please note that this parameter only accepts surnames, including both first and last name will result in not finding a match in the Census data. This function returns the name you input, the named matched through Census data, and likely race of that name, and the probability that the name is of each race.
```{r}
predict_race("Washington")
```
This function accepts a single string or a vector of strings.
```{r}
predict_race(c("Washington", "Franklin", "Lincoln"))
```
If you only want the most likely race and not the individual probabilities of each race, set the parameter `probability` to FALSE.
```{r}
predict_race("Washington", probability = FALSE)
```
### Race of a first name
To get the race of a first name, you again use `predict_race()` but now set the parameter `surname` to FALSE. This returns the exact same results as above, but now only works for first names (you may still get a match of a name if you don't set the parameter to FALSE but that is simply because some first names are also last names. The race results will likely be incorrect.).
```{r}
predict_race("sarah", surname = FALSE)
```
This function accepts a single string or a vector of strings.
```{r}
predict_race(c("sarah", "jaime", "jon", "bao"), surname = FALSE)
```
If you only want the most likely race and not the individual probabilities of each race, set the parameter `probability` to FALSE.
```{r}
predict_race("sarah", probability = FALSE, surname = FALSE)
```
### Gender of a first name
To get the gender of a first name you use the `predict_gender()` function and the only required parameter in `predict_gender()` is `name` which is the first name you want to find the gender of. This function returns the name you input, the named matched through SSA data, and likely gender of that name, and the probability that the name is female or male.
```{r}
predict_gender("tyrion")
```
```{r}
predict_gender(c("tyrion", "jaime", "jon", "sansa", "arya", "cersei"))
```
If you only want the most likely gender and not the individual probabilities of each gender, set the parameter `probability` to FALSE.
```{r}
predict_gender(c("tyrion", "jaime", "jon", "sansa", "arya", "cersei"), probability = FALSE)
```
In cases where there is no match in the data, it will return NA.
```{r}
predict_gender("zzztyrion")
```