Skip to content

Predicting election results by county for the 2016 US general election with machine learning in R.

License

Notifications You must be signed in to change notification settings

ijeffries/election-predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

election-predictions

alt text

Index

  1. Summary
  2. File Directory
  3. Language and Packages Used
  4. Credits
  5. License

Summary

The following project accomplishes two goals:

  1. Predicting the 2016 US election results by county with supervised machine learning in R.
  2. Mining interesting association rules that relate to demographics and voting preference in R.

Three supervised machine learning models are used to predict election results based on demographics: K-Nearest Neighbor, Decision Trees, and Artificial Neural Networks. The models are compared based on accuracy and precision.

File Directory

  1. data - contains three data sets used in analysis (taken from kaggle, referenced in the credits):
    a. county_facts.csv - Demographic breakdown of each county
    b. county_facts_dictionary.csv - Dictionary to decode variable names in county_facts.csv
    c. pres16results.csv - Results of the 2016 election by county

  2. images - contains vizualizations:
    a. decision_tree.png - Decision tree created from modelling process
    b. model_comparison.png - Comparison of 3 classification models used
    c. population_trends.png - Population size by voting preference
    d. voting_trends.png - Voting trends by top 5 normalized demographics

  3. classification.Rmd - R Markdown detailing the entire classification process, from data cleaning to model creation.

Language and Packages Used

R is used for all model building - in the results R and SAS are compared.

The following packages are used:

#list of packages used
packages <- c("dplyr", "tidyr", "ggplot2", "class", "rpart", "rpart.plot", "neuralnet", "arules",
            "plyr", "mltools", "arulesViz", "plotly", "RCurl")

#check to see if package is already installed, if not, install
for(p in packages){
if(!require(p, character.only = TRUE)) {
  install.packages(p)
  library(p, character.only = TRUE)
} 
}

Credits

  1. Would like to thank Ben Hammer for the county_facts.csv and county_facts_dictionary.csv datasets, which were taken off Kaggle.
  2. Would like to thank Steve Palley for the pres16results.csv dataset, which was taken off Kaggle.

License

MIT License Copyright (c) 2019 Ian Jeffries