eda_and_prediction

This project conducts an exploratory data analysis of the Global Financial Inclusion (Global Findex) Database of 2014. The analysis aims to reveal data patterns, trends, correlations, and insights that can assist businesses, organizations, or researchers in making informed financial decisions. The dataset used was collected from the world bank.

The statistical and graphyical exploratory data analysis was done using PySpark.

In order to predict the outcome, the project explored six Machine Learning algorithms: Logistic Regression, Decision Tree, Support Vector Classifier, Naive Bayes, Random Forest, and Gradient Boosting. Each algorithm was chosen based on its suitability for the dataset characteristics and the problem at hand—for example, Decision Trees for their interpretability and Random Forests for handling non-linear data without overfitting. Models were trained using binary classification to ensure generalizability, and performance was evaluated using accuracy, F1- score metrics, mean square error (MSE) and root mean square error (RMSE).

The Gradient Boosting model performed the best among all the models.

F1- score: 0.72

Accuracy: 73%

Mean Square Error (MSE): 0.27

Root Mean Square Error (RMSE): 0.52

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
Final Report.pdf		Final Report.pdf
README.md		README.md
final_project_6300.ipynb		final_project_6300.ipynb
micro_world.csv		micro_world.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eda_and_prediction

About

Releases

Packages

Languages

Rnamrata/analyzing_global_financial_database

Folders and files

Latest commit

History

Repository files navigation

eda_and_prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages