Skip to content

Using a recurrent neural network in TensorFlow to predict ethnicity by last name.

Notifications You must be signed in to change notification settings

priyankurs/Name-Ethnicity-Classifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Name Ethnicity Classifier

Humans are often able to accurately predict someone's ethnicity based on their last name, even if they have not seen it before. For example, one can predict that "Yang" is more likely to be a Chinese last name than a Japanese one, while "Kobayashi" seems characteristically Japanese.

I used a recurrent neural network with TensorFlow in Python to train a model to do this automatically. The model achieved 97% accuracy in classifying names as Chinese pr Japanese names, 87% accuracy in classifying names as Chinese, Japanese, or Vietnamese, and 79% accuracy in classifying names as Chinese, Japanese, Vietnamese, or Korean.

This drastic decrease in accuracy with the introduction of Vietnamese and Korean makes sense, and would likely also be seen among humans, as Korean and Vietnamese names seem similar to Chinese names. Even within the dataset, some Korean and Vietnamese names were the same as Chinese names ("Tien" is listed both as Chinese and Vietnamese, while "Wang" is listed both as Chinese and Korean).

Name data was scraped from familyeducation.com.

About

Using a recurrent neural network in TensorFlow to predict ethnicity by last name.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%