Income Predictor

The dataset used to build the predicitve model is the adult dataset form the UCI machine learning repository.

Key Observations

The dataset contains two attributes specific to each row as the target value, “>50k” for income over $50,000 and “<=50k” for income under $50,000 dollars. The dataset contains 32,560 entries excluding headers and 15 columns with 24 duplicate entries.

There isn't much diversity in the dataset for features such as capital gain, capital loss, work class, native country and race.

Furthermore, it is evident from the table below there is quite a bit of deviation in the dataset for hours per week, fnlwgt, capital loss and capital gain. Additionally, the box and whiskers plot below further solidifies the fact there are a significant number of outliers in these values. This is why the the MinMax scaler was used instead of the Standard scaler as the MinMax scaler is not sensitive to outliers.

Best Model

XGBoost

k-Fold Validation: 87.1%

Logarithmic Loss: 4.25

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
adult.data		adult.data
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Income Predictor

Key Observations

Best Model

XGBoost

About

Uh oh!

Releases

Packages

Languages

amansahil/Income-Predictor

Folders and files

Latest commit

History

Repository files navigation

Income Predictor

Key Observations

Best Model

XGBoost

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages