This project is the result of PRAC2 of the subject "Tipologia i cicle de vida de les dades" of the Master of Data Science of the UOC (Universitat Oberta de Catalunya).
## Description
The goal of this project was to work with a dataset and apply data cleaning to proceed with a statistical analysis.
I have chosen the dataset "US Census Data" (https://www.kaggle.com/johnolafenwa/us-census-data) which is formed by sociological variables. I've focused the analysis to check if there's a bias in the sociological variables (such as sex, race, etc) in terms of the income.
To do so, 3 different analysis have been made:
- Chisquare Test: to prove to no-independence between sex and income.
- Correlation: to check the relation between all columns.
- Logístic Regression: to check if the prediction was good enough and to see if the coeficient can reveal another biases.