Skip to content

An income predictor built on the adult data set from the UCI machine learning repository

Notifications You must be signed in to change notification settings

amansahil/Income-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Income Predictor

The dataset used to build the predicitve model is the adult dataset form the UCI machine learning repository.

Key Observations

The dataset contains two attributes specific to each row as the target value, “>50k” for income over $50,000 and “<=50k” for income under $50,000 dollars. The dataset contains 32,560 entries excluding headers and 15 columns with 24 duplicate entries.

There isn't much diversity in the dataset for features such as capital gain, capital loss, work class, native country and race.

Screen Shot 2020-04-29 at 7 48 57 PM

Screen Shot 2020-04-29 at 7 50 33 PM

Screen Shot 2020-04-29 at 7 50 24 PM

Furthermore, it is evident from the table below there is quite a bit of deviation in the dataset for hours per week, fnlwgt, capital loss and capital gain. Additionally, the box and whiskers plot below further solidifies the fact there are a significant number of outliers in these values. This is why the the MinMax scaler was used instead of the Standard scaler as the MinMax scaler is not sensitive to outliers.

Screen Shot 2020-04-29 at 7 53 47 PM Box

Best Model

XGBoost

k-Fold Validation: 87.1%

Logarithmic Loss: 4.25

Screen Shot 2020-04-29 at 7 53 13 PM

About

An income predictor built on the adult data set from the UCI machine learning repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published