This is our Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on clustering analysis on countries data.
Real world problem
We are a Non-Governmental Organisation seeking to roll out a new programme to reform and help underdeveloped countries to progress. Due to budget. constraints, we need to strategically determine which countries to assist. Hence, we aim to identify the countries that are performing poorly in areas beyond just GDP per capita. These areas include healthcare, education, access to necessities etc.
Data science problem
Identify a cluster of countries that are underdeveloped in multiple areas based on a comprehensive set of indicators.
Hi there! Join us on our data journey.
View files in following order:
- un_member_states.ipynb
- country_name_map.ipynb
- data_cleaning.ipynb
- data_exploration.ipynb
- PCA.ipynb
- kmeans.ipynb
- hierarchical.ipynb
- model_comparison.ipynb
We have identified 34 countries, which is the intersection of the worst performing clusters of our K-Means and Hierarchical models, to be the least developed countries that we should offer aid to.
Chiraag - data cleaning, clustering models.
Chen rui - principal component analysis, cluster interpretation (visualisation).
Vaish - data preparation, EDA.
Arias, F. J. C. (2019, February 7). Fuzzy String Matching in Python Tutorial. DataCamp Community. Retrieved March 21, 2022, from https://www.datacamp.com/community/tutorials/fuzzy-string-python