generated from jtr13/EDAVproject
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-missing.Rmd
31 lines (23 loc) · 986 Bytes
/
04-missing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Missing values
## Load library
```{r echo=FALSE}
library(tidyverse)
library(extracat)
```
## Load data
```{r echo=FALSE}
index = c('loan_amnt','issue_d','annual_inc','addr_state','grade','purpose','loan_status','home_ownership','emp_length')
df <- read.csv(file="loan.csv", header=TRUE, sep=",")[index]
df[df==""] <- NA
df[df=="n/a"] <- NA
```
## Missing column pattern graph
```{r echo=FALSE}
visna(df, sort = 'b')
```
## Analysis
This is the graph that describes the missing column pattern. It shows that we have some missing values in the variables 'emp_length' and 'annual_income'. The missing pattern shows that:
- 1. Most of the missing values are in the variable 'emp_length'.
- 2. There is no record that has missing values in these two variables at the same time.
- 3. The missing values take only a small (about 5%) part in the total number of records.
In our analysis, we will simply delete those records with missing values.