Author: Domingo Guzman
Date: October 24, 2022
The case study follows the six step data analysis process:
❓ Ask
💻 Prepare
🛠 Process
📊 Analyze
📋 Share
🧗♀️ Act
The Coalition to Stop Gun Violence (CSGV) is a non-profit gun control advocacy organization that is opposed to gun violence. The CSGV was founded in 1974, making it the nation’s oldest gun violence prevention organization. The CSGV believes that gun violence should be rare and abnormal. They pursue the goal of ending gun violence through policy development, advocacy, community engagement, and effective training. Through a combination of evidence-based policy development and aggressive lobbying, the CSGV is leading the way forward in fighting for a safer and stronger America.
BUSINESS TASK: Our analytics team has been asked to come up with strong, evidence-backed points that will assist the CSGV in gaining supporters.
Primary stakeholders: Joshua Horwitz, executive director.
Secondary stakeholders: Local communities
Data Sources: Gun Violence Incidents in the United States from Emmanuel F. Werr: https://www.kaggle.com/datasets/emmanuelfwerr/gun-violence-incidents-in-the-usa
US Police Shootings from 2015- Sep 2022 from Ram Jas: https://www.kaggle.com/datasets/ramjasmaurya/us-police-shootings-from-20152022
The datasets have 3 CSV files, 24 columns, and 500,000 rows. The data also follows a ROCCC approach:
- Reliability - MED: The Gun Violence Incidents in the Unites States data is complete and accurate. It comes from the Gun Violence Archive (GVA). The GVA website maintains a database of known shootings in the United States, coming from law enforcement, media and government sources from all 50 states. The US Police shootings data was scraped from Wikipedia.
- Original - MED: GVA is an independent data collection and research group. Data on the US Police Shootings dataset gathered from The Counted, a website that tracks the number of people killed by police in the US.
- Comprehensive - HIGH: The data includes names, dates, manner of death, state where death occurred, city, address, number of people killed & injured, age, race, gender, whether victim was armed, and if there were signs of mental illness.
- Current - HIGH: The data is current. It goes from 2013 to September 2022.
- Cited - MED: The data is cited from Gun Violence Archive, and Wikipedia
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(janitor)
# Read the dataframes
all_incidents <- read_csv(all_incidents.csv)
mass_shootings <- read_csv("mass_shootings.csv")
police_shootings <- read_csv("police_shootings.csv")
Examining the data:
head(all_incidents)
colnames(all_incidents)
dim(all_incidents)
head(mass_shootings)
colnames(mass_shootings)
dim(mass_shootings)
head(police_shootings)
colnames(police_shootings)
dim(police_shootings)
Removing columns that will not be used in the analysis:
all_incidents_clean <- all_incidents %>% select(-c(incident_id))
police_shootings_clean <- police_shootings %>% select(-c(id, name))
mass_shootings_clean <- mass_shootings %>% select(-c('Incident ID', Address))
Removing null values from columns in each dataset:
all_incidents_new <- na.omit(all_incidents_new)
mass_shootings_clean <- na.omit(mass_shootings_clean)
police_shootings_clean <- na.omit(police_shootings_clean)
The all_incidents dataset covers data from 2013 - present day, while the police_shootings dataset covers data from 2015 - September 2022. This means that data from 2013-2014 will need to be factored out from all_incidents so that the same years can be compared.
# removing years 2013-2014
all_incidents_new <- all_incidents_clean %>% filter(date >= "2014-12-31")
Let's confirm that the years 2013-2014 were removed by checking the end of the dataframe.
tail(all_incidents_new)
# A tibble: 6 × 6
date state city address n_kil…¹ n_inj…²
<date> <chr> <chr> <chr> <dbl> <dbl>
1 2015-01-01 Florida Alachua 7108 NW 92nd Pla… 0 0
2 2015-01-01 New Jersey Jersey City Virginia Avenue 0 3
3 2015-01-01 New York Staten Island 1307 Arthur Kill… 0 2
4 2015-01-01 Michigan Saint Joseph 396 Upton Drive 1 0
5 2015-01-01 New York Rochester 402 West Ridge R… 0 1
6 2015-01-01 Ohio Lorain 2217 East 28th St 0 3
Now that the correct years for both datasets have been verified, the next step is fixing the 'state' categories. The 'state' category in the police_shootings dataset is abbreviated, while the full name of the state is spelled out in the all_incidents_new and mass_shootings datasets. This must be fixed so the state formats are all the same.
head(all_incidents_new)
# A tibble: 6 × 6
date state city address n_kil…¹ n_inj…²
<date> <chr> <chr> <chr> <dbl> <dbl>
1 2022-05-28 Arkansas Little Rock W 9th St and B… 0 1
2 2022-05-28 Colorado Denver 3300 block of … 0 1
3 2022-05-28 Missouri Saint Louis Page Blvd and … 0 1
4 2022-05-28 South Carolina Florence Old River Rd 0 2
5 2022-05-28 California Carmichael 4400 block of … 1 0
6 2022-05-28 Kentucky Louisville 400 block of M… 0 1
head(police_shootings)
# A tibble: 6 × 17
id name date manne…¹ armed age gender race city state
<dbl> <chr> <date> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 1 Tim El… 2015-01-02 shot gun 53 M A Shel… WA
2 2 Lewis … 2015-01-02 shot gun 47 M W Aloha OR
3 3 John P… 2015-01-03 shot a… unar… 23 M H Wich… KS
4 4 Matthe… 2015-01-04 shot toy … 32 M W San … CA
5 5 Michae… 2015-01-04 shot nail… 39 M H Evans CO
6 6 Kennet… 2015-01-04 shot gun 18 M W Guth… OK
head(mass_shootings)
# A tibble: 6 × 7
`Incident ID` `Incident Date` State City …¹ Address # Kil…² # Inj…³
<dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 271363 December 29, 2014 Loui… New Or… Poydra… 0 4
2 269679 December 27, 2014 Cali… Los An… 8800 b… 1 3
3 270036 December 27, 2014 Cali… Sacram… 4000 b… 0 4
4 269167 December 26, 2014 Illi… East S… 2500 b… 1 3
5 268598 December 24, 2014 Miss… Saint … 18th a… 1 3
6 267792 December 23, 2014 Kent… Winche… 260 Ox… 1 3
I will use the built in state.abb and match functions to achieve this:
# change all states into their abbreviations
all_incidents_new$state <- state.abb[match(all_incidents_new$state, state.name)]
mass_shootings$State <- state.abb[match(mass_shootings$State, state.name)]
head(all_incidents_new)
# A tibble: 6 × 6
date state city address n_kil…¹ n_inj…²
<date> <chr> <chr> <chr> <dbl> <dbl>
1 2022-05-28 AR Little Rock W 9th St and Broadway St 0 1
2 2022-05-28 CO Denver 3300 block of Clay St 0 1
3 2022-05-28 MO Saint Louis Page Blvd and Vandevent… 0 1
4 2022-05-28 SC Florence Old River Rd 0 2
5 2022-05-28 CA Carmichael 4400 block of Manzanita… 1 0
6 2022-05-28 KY Louisville 400 block of M St 0 1
head(police_shootings_clean)
# A tibble: 6 × 15
date manner_of_d…¹ armed age gender race city state signs…²
<date> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lgl>
1 2015-01-02 shot gun 53 M A Shel… WA TRUE
2 2015-01-02 shot gun 47 M W Aloha OR FALSE
3 2015-01-03 shot and Tas… unar… 23 M H Wich… KS FALSE
4 2015-01-04 shot toy … 32 M W San … CA TRUE
5 2015-01-04 shot nail… 39 M H Evans CO FALSE
6 2015-01-04 shot gun 18 M W Guth… OK FALSE
head(mass_shootings_clean)
# A tibble: 6 × 6
`Incident Date` State `City Or County` `# Killed` `# Injured` state
<chr> <chr> <chr> <dbl> <dbl> <chr>
1 December 29, 2014 LA New Orleans 0 4 LA
2 December 27, 2014 CA Los Angeles 1 3 CA
3 December 27, 2014 CA Sacramento 0 4 CA
4 December 26, 2014 IL East St. Louis 1 3 IL
5 December 24, 2014 MO Saint Louis 1 3 MO
6 December 23, 2014 KY Winchester 1 3 KY
- Gun-related deaths:
- Gun-related injuries:
- Gun-related deaths per year:
- Gun-related injuries per year:
- Number of Victims of Police Shootings by Race since 2015:
- Percentages of Shootings by Mental Illness:
- Police Shooting Victims by Age Group Since 2015:
Check the number of total gun-related deaths per state in the US.
ggplot(data = all_incidents_new, aes(x = state, y = n_killed)) + geom_bar(stat = "identity", fill = "black", color = "darkred") + labs(title = "Gun-related Deaths per State") + theme(axis.text.y = element_text(hjust = 1, size = 8)) + coord_flip()
The above visualization clearly depicts the number of people that died from gun violence in each state from 2015-2022. Texas, California, and Florida are leading the country when it comes to these types of deaths. The CSVG can use this information to strengthen their precense in these higher-risk states. By holding more rallies and events, the CSVG will gain a large number of supporters due to these communities being impacted by gun violence the most.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/GunViolenceCaseStudy/Sheet1
Check the number of total gun-related injuries per state in the US.
ggplot(data = all_incidents_new, aes(x = state, y = n_injured)) + geom_bar(stat = "identity", fill = "black", color = "darkred") + labs(title = "Gun-related Injuries per State") + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))
After inspecting the above graph it's clear that Illinois has the most gun-related injuries. Next is California, Texas, then Pennsylvania. We know from the first graph that California and Texas were high gun violence states. With the information from this injuries graph, we can see that Illinois and Pennsylvania also have a high number of gun violence incidents. This makes them great candidates for an increase in CSVG presence.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceCaseStudy2/Sheet1
ggplot(data = df, aes(x = year, y = n_killed, group = 1)) + geom_line(color = "red") + geom_point() + labs(title = "Gun-related Deaths per Year")
We can see from the ggplot() line graph that there is an increase in the number of deaths each year, with the exception of 2018. This is concerning because more and more people are getting killed yearly as a result of gun violence.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceViz3/Sheet1
ggplot(data = df2, aes(x = year, y = n_injured, group = 1)) + geom_line(color = "orange") + geom_point() + labs(title = "Gun-related Injuries per Year")
From first glancing at this line graph, one can see that from 2015-2019 the number of injuries slightly increase each year. When it gets to 2020 and 2021, however, there is a huge increase in injuries. We can see from this that gun-related violence is only getting worse and worse as the years go by, just like it was depicted in the previous graph.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceCaseStudy4/Sheet1
Now let's explore the police_shootings dataset. To start off, let's see the breakdown of victims according to their race.
police_shoot_race <- police_shootings_clean %>% drop_na(race) %>% group_by(race) %>% dplyr::summarise(count = n()) %>% arrange(desc(count))
police_shoot_race$race <- factor(police_shoot_race$race, levels = unique(as.character(police_shoot_race$race)))
ggplot(data = police_shoot_race, aes(x = race, y = count)) + geom_col(fill = 'brown') + labs(title = "Number of Victims of Police Shootings by Race since 2015", x = 'Race', y = 'Number of Shootings') + geom_text(aes(label = count), vjust = -.5)
W = White B = Black H = Hispanic A = Asian N = Native American O = Other
According to the data, the majority of lethal police shooting victims are white/caucasian. The next two with highest number of victims are blacks and hispanics. This is a valuable visual that the CSGV can use to gain the support of white, black, and hispanic community members.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceCaseStudy5/Sheet1
The CSVG is in favor of stricter mental health screenings for firearm purchases. The next visualization will allow us to see the percentage of victims that were mentally ill.
pie(police_shoot_illness$percent, labels = police_shoot_illness$percent, main = 'Percentages of Shootings by Mental Illness', col = c('light blue', 'pink'))
legend('right', c("Not ill", "ill"), fill = col)
76% of victims were not mentally ill while 24% had signs of mental illness. 24% is an alarming number because it means that these people were able to gain access to guns.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceCaseStudy7/Sheet1
# Breakdown of victims by age group
us_pol_age_brackets <- police_shootings_clean
%>% mutate(
age_group = dplyr::case_when(
age >= 0 & age <= 20 ~ "0-20",
age >= 21 & age <= 40 ~ "21-40",
age >= 41 & age <= 60 ~ "41-60",
age >= 61 & age <= 80 ~ "61-80",
age >= 81 & age <=100 ~ "81-100"),
age_group = factor(age_group, levels = c("0-20", "21-40", "41-60", "61-80", "81-100")))
age_brackets <- us_pol_age_brackets %>% tabyl(age_group) %>% drop_na()
ggplot(data = age_brackets, aes(x = age_group, y = n)) + geom_col(fill = 'salmon') + labs(title = "Police Shooting Victims by Age Group Since 2015", x = 'Age Group', y = 'Number of Victims') + geom_text(aes(label = n), vjust = -.5)
The age brackets of "21-40" and "41-60" have the highest number of fatal police shootings. This closely aligns with the information given by the Federal Bureau of Prisons through August 27 that shows the median age to be 36.
Tableau Viz: https://public.tableau.com/app/profile/domingo.guzman/viz/USGunViolenceCaseStudy6/Sheet1
Tableau Dashboard: https://public.tableau.com/app/profile/domingo.guzman/viz/dash_16740948361610/Dashboard1
After analyzing the Gun Violence Incidents and US Police Shootings datasets I came up with strong, evidence-backed points that will assist the CSGV in gaining supporters.
-
In the first visualization, we saw that Texas, California, and Florida had the highest number of gun-related deaths by a large margin. With this information, I recommend that the CSGV heavily increase their presence in these states. Doing this and showing the communities the data will lead to a large gain of supporters.
-
Gun-related injuries are most prevalent in California, Texas, and Pennsylvania. Similarly to the point above, the CSGV can increase their presence in these states to gain supporters.
-
In the third visualization shown, we notice that the number of deaths per year is increasing at an alarming rate. This means that more and more people are getting killed yearly as a result of gun violence. This can be used as a focal point when doing presentations and rallies so that communities understand that gun violence is only getting worse as time goes on.
-
When it comes to police shootings by race: caucasian, black and hispanic people get shot at the most. The CSGV can target these communities more than others in order to gain their support.
-
I wanted to look at the percentage of shootings by mental illness because the CSGV is in favor of stricter mental health screenings for firearm purchases. After visualizing the data, it was discovered that 24% of victims had signs of mental illness. This is an area of concern because it shows that these people were able to get access to guns. The CSGV can show supporters at rallies that the number of incidents would decrease if there were more strict policies in place to obtain firearms. As a result, the increase in pressure from the public may convince lawmakers to implement policies to make it harder to get firearms for these individuals.